From: Jack Bush [mailto:netbeansfan@yahoo.com.au]
Sent: 06 January 2009 13:25
To: Michael Kay
Cc: xml-dev@lists.xml.org
Subject: Re: [xml-dev] JDOM XSLT TransformerException (newbie)Hi Michael,
By replacing the FileReader with InputStream the following codes has finally allow me to read and transformed state.xml to state.html but only when there is an Internet Online connection:
19. SAXBuilder stateBuilder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser", false);
20. stateBuilder.setValidation(false);
21. FileInputStream stateIS = new FileInputStream("E:\\state.xml");
22. BufferedInputStream stateBIS = new BufferedInputStream(stateIS);
23. Document stateOriginaljdomDocument = stateBuilder.build(stateBIS);
24. TransformerFactory stateFactory = TransformerFactory.newInstance();
25. Transformer stateTransformer = stateFactory.newTransformer(new StreamSource("E:\\stateStyleSheet.xsl"));
26. JDOMSource stateSource = new JDOMSource(stateOriginaljdomDocument);
27. JDOMResult stateResult = new JDOMResult();
28. stateTransformer..transform(stateSource, stateResult);
......
Offline
javax.xml.transform.TransformerException: org.jdom.JDOMException: DTD parsing error: www.w3.org
at org.apache.xalan.transformer.TransformerImpl.fatalError(TransformerImpl.java:738)
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:712)
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1126)
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1104)
at XMLProject.main(generateXML.java:28)
Caused by: org.jdom.JDOMException: DTD parsing error: www.w3.org
at org.jdom.transform.JDOMSource$DocumentReader.parse(JDOMSource.java:525)
at org.apache.xml.dtm.ref.DTMManagerDefault.getDTM(DTMManagerDefault.java:478)
at org.apache.xalan.transformer..TransformerImpl.transform(TransformerImpl.java:655)
... 3 more
It appears that the transformation process is trying to validate DTD even though I have turned validation off during parsing. Can you confirm whether the validation attempt is occurring during parsing or transformation step? And how to prevent it from recurring?
At the sametime, the content of state.html is:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<body>
<h2>Transformed State Detail</h2>
<table border="1">
<tr bgcolor="lightblue">
<th align="left">Area Link</th>
<th align="left">Area Name</th>
</tr>
</table>
</body>
</html>This means that the stateStyleSheet..xsl below is not able to use XPath search to retrieve both Area Link and Area Name from state.xml:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>Transformed State Detail</h2>
<table border="1">
<tr bgcolor="lightblue">
<th align="left">Area Link</th>
<th align="left">Area Name</th>
</tr>
<xsl:for-each select="/html/body/div[@id='content']/table[@class='sresults']/tr/td/a">
<tr>
<td><xsl:value-of select="@href"/></td>
<td><xsl:value-of select="@title"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Again, the following XPath search statements have found Area Link, Area Name from state.xml:
XPath stateXpath = XPath.newInstance("/ns:html/ns:body/ns:div[@id='content']/ns:table[@class='sresults']/ns:tr/ns:td/ns:a");
stateXpath.addNamespace("ns", "http://www.w3.org/1999/xhtml");In short, how to include the implicit/explicit/both namespace to accurately pick up these values from within stateStyleSheet.xsl?
Many thanks again for your valuable advice,
Jack
From: Michael Kay <mike@saxonica.com>
To: Jack Bush <netbeansfan@yahoo.com.au>; Robert Koberg <rob@koberg.com>
Cc: xml-dev@lists.xml.org
Sent: Monday, 5 January, 2009 10:19:14 PM
Subject: RE: [xml-dev] JDOM XSLT TransformerConfigurationException
Well, for some reason it looks as if you are trying to parse using TagSoup but the stack trace shows you are actually parsing using Xerces.Michael Kay
From: Jack Bush [mailto:netbeansfan@yahoo.com.au]
Sent: 05 January 2009 02:38
To: Michael Kay; Robert Koberg
Cc: xml-dev@lists.xml.org
Subject: Re: [xml-dev] JDOM XSLT TransformerConfigurationExceptionHi Michael,
The following statements generated state.xml file:
URL stateUrl = new URL("http://www.abc.com");
URLConnection stateconnection = stateUrl.openConnection();
stateisInHtml = stateconnection.getInputStream();
statedisInHtml = new DataInputStream(new BufferedInputStream(stateisInHtml));
System.out.flush();
statefosOutHtml = new FileOutputStream("state.html");
while ((oneChar=statedisInHtml.read()) != -1)
statefosOutHtml..write(oneChar);
.....
statefrInHtml = new FileReader("state.html");
statebrInHtml = new BufferedReader(statefrInHtml);
SAXBuilder statesaxBuilder = new SAXBuilder("org.ccil.cowan.tagsoup..Parser", false);
org.jdom.Document statejdomDocument = statesaxBuilder.build(statebrInHtml);
XMLOutputter stateoutputter = new XMLOutputter();
statefwOutXml = new FileWriter("state.xml");
statebwOutXml = new BufferedWriter(statefwOutXml);
stateoutputter.output(statejdomDocument, statebwOutXml);
XPath had no problem looking up state.xml.
Thanks,
Jack
From: Michael Kay <mike@saxonica.com>
To: Jack Bush <netbeansfan@yahoo.com.au>; Robert Koberg <rob@koberg.com>
Cc: xml-dev@lists.xml.org
Sent: Monday, 5 January, 2009 2:13:33 AM
Subject: RE: [xml-dev] JDOM XSLT TransformerConfigurationException
Nevertheless, I now encountered another issue this time:java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence.There's only one explanation of that: the parser is expecting the document to be encoded in UTF-8 but it isn't. To understand why it isn't, you need to examine how the document was created and any transcodings that might have taken place before it reached the parser.
at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.skipChar(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache..xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces..parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:489)
at org.jdom...input.SAXBuilder.build(SAXBuilder.java:928)at JDOMTrAXPojoInvestmentBean.main(JDOMTrAXPojoInvestmentBean.java:45)The header of state.xml is as follows:<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE html (View Source for full doctype...)>
Any ideas on what is the cause of this issue and how to overcome it? Likewise, how to define the correct proper namespace prefix? Is it possible that this document has two namespaces. A default one and one with prefix 'html'? If so, which one should I use?It's certainly inelegant to bind the same namespace to two prefixes like this, though it's not incorrect. Again to prevent it happening we need to understand how you created the document.Michael Kay
Stay connected to the people that matter most with a smarter inbox. Take a look.
Stay connected to the people that matter most with a smarter inbox.. Take a look.