Hi Everyone,
I have added the additional I/O statements in the finally clause as follows but the problem still persisted:
readData()
// reading data (html) from the webpage and save it in html format.
try {
….
}
catch { …. }
finally {
System.out.flush();
isInHtml.close();
disInHtml.close();
fosOutHtml.flush();
fosOutHtml.getFD().sync();
fosOutHtml.close();
}
// convert the html webpage format to xml format
try {
….
}
catch { …. }
finally {
System.out.flush();
fwOutXml.flush();
fwOutXml.close();
pwOutXml.flush();
pwOutXml.close();
}
Below is a short listing of the new XML file:
<?xml version="1.0" encoding="iso-8859-1" ?>
- <<html>
- <<head>
<<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<<meta name="keywords" content="
<<meta name="description" content="Cities, towns and suburbs in
<<title>Cities and Towns in
<<link rel="stylesheet" href="http://www.abc.com/style.css" type="text/css" media="screen" />
</head>
- <<body>
<<a name="top" />
- <<div id="container">
- <<div id="header">
<<div id="postmark" />
- <<a href="http://www.abc.com/" class="imglink">
<<img id="logoimg" src="http://www.abc.com/images/zipcodes.gif" width="192" height="33" alt="Zipcodes America Logo" />
</a>
<<hr />
</div>
- <<div id="nav">
- <<ul>
- <<li>
<<a href="http://www.abc.com/" title="Home Page">Home</a>
</li>
- <<li>
<<strong>Search</strong>
(zipcode or suburb)
- <<div class="hide">
<<form method="post" action="http://www.abc.com/search" /> // line 23
</div>
<<input type="text" name="q" class="searchbox" alt="Search query" />
<<br />
<<input type="submit" value="find!" class="searchbutton" alt="Perform search" />
<<div class="hide" />
</li>
…
What I find it interesting is that it is possible to parse the above XML file with the same parseData() from another class without any problem. As a result, I have come to the following conclusion so far:
( i ) There is some file locking that is prevent saxBuilder from parsing the XML file at the time.
( ii ) The light_html2xml does not appears to have correctly converted over the orginal Html to Xml but some how it has been picked up by the parser in the same class, but not by the same parser from another class.
( iii ) I would like to use another conversion tool such as Tagsoup in place of light_html2xml to determine where the cause of this issue is coming from. As a result, would anyone be able to assist me coming up with a few lines of conversion statements using Tagsoup since I am not familiar with using this tool?
( iv ) light_html2xml is good as it strip out all namespace, DTD, Entity Resolver, etc and only return what I need. JTidy does correct conversion but include namespace, DTD, Entity Resolver which makes parsing difficulty.
Many thanks again,
Jack
Jack – did you try fosOutHtml.getFD().sync() after the flush?
Regards
Sheila
From: Jack Bush [mailto:netbeansfan@yahoo.com.au]
Sent: Tuesday, October 28, 2008 8:41 AM
To: Robert Koberg
Cc: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Cannot close an XML file used for parsing
Hi Robert,
Thanks for responding to this post.
I have added your suggestion but the issue still persist. Nevertheless, I do believe that this is caused by the new XML file not having been closed properly.
There is no problem with light-html2xml method which has worked in the past.
Any more suggestion to try out?
Thanks,
Jack
From: Robert Koberg <rob@koberg.com>
To: Jack Bush <netbeansfan@yahoo.com.au>
Cc: xml-dev@lists.xml.org
Sent: Tuesday, 28 October, 2008 9:42:21 AM
Subject: Re: [xml-dev] Cannot close an XML file used for parsing
close the stream or reader in a finally block to avoid leaving it open
if an error occurs.
try{
}catch(....){
}finally {
}
On Oct 27, 2008, at 6:03 PM, Jack Bush
wrote:
> Hi All,
>
> I appears to have difficulty closing (possibly flushing it first) an
> XML file that was subsequently being parsed without success. The
> error generated is:
>
> org.jdom.input.JDOMParseException: Error on line 23: The element
> type "form" must be terminated by the matching end-tag "</form>".
>
> Below is the code snippets of readData() to retrieve (HTML) data
> from a website, save it to a file, then convert to XML format before
> returning the new filename:
> public String readData() {
>
> try {
> URL url = new URL("http://www.abc.com");
> URLConnection connection = url.openConnection();
> InputStream
isInHtml = url.openStream(); // throws an
> IOException
> disInHtml = new DataInputStream(new
> BufferedInputStream(isInHtml));
> System.out.flush();
> FileOutputStream fosOutHtml = null;
> fosOutHtml = new FileOutputStream("C:\\Temp\\ABC.html");
> int oneChar, count=0;
> while ((oneChar=disInHtml.read()) != -1)
> fosOutHtml.write(oneChar);
> isInHtml.close();
> disInHtml.close();
> fosOutHtml.flush(); // optional
> fosOutHtml..close();
>
.....
> }
>
> try {
> File fileInHtml = new File("C:\\Temp\\ABC.html");
> FileReader frInHtml = new FileReader(fileInHtml);
> BufferedReader brInHtml = new BufferedReader(frInHtml);
> String string = "";
> while (brInHtml.ready())
> string += brInHtml.readLine() + "\n";
> fwOutXml = new FileWriter("C:\\Temp\\ABC.xml");
> pwOutXml = new PrintWriter(fwOutXml);
> light_html2xml html2xml = new light_html2xml();
> pwOutXml.print(html2xml.Html2Xml(string));
>
system.out.flush() // optional
> fwOutXml.flush(); // optional
> fwOutXml.close();
> pwOutXml.flush(); // optional
> pwOutXml.close();
> return fileInHtml.getAbsolutePath();
> ....
> }
> }
>
> // parseData reads the XML file using the name returned by readData()
> public void parseData(String XMLFilename)
> {
> try
> {
> FileReader frInXml = new FileReader(FileName);
> BufferedReader brInXml = new BufferedReader(frInXml);
> SAXBuilder saxBuilder = new
>
SAXBuilder("org.apache.xerces.parsers.SAXParser"); //
> JDOMParseException generated.
> ....
> }
> These codes would worked when they were in a single method but I
> have since placed some structure around them using a number methods.
>
> This issue has risen in th past where I have been able to close the
> XML file prior to reading them again. However, I don't have a
> solution for it this time round.
>
> I am running JDK 1.6.0_10, Netbeans 6.1, JDOM 1.1 on Windows XP
> platform.
>
> Any assistance would be appreciated..
>
> Many thanks,
>
> Jack
>
> Make the switch to the world's best email. Get Yahoo!7 Mail.
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to
support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.
[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
Search 1000's of available singles in your area at the new Yahoo!7 Dating. Get Started.