XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Cannot close an XML file used for parsing

Hi Everyone,

 

I have added the additional I/O statements in the finally clause as follows but the problem still persisted:

 

readData()

// reading data (html) from the webpage and save it in html format.

try {

    ….

}

catch { …. }

finally {

       System.out.flush();

       isInHtml.close();

       disInHtml.close();

       fosOutHtml.flush();

       fosOutHtml.getFD().sync();

       fosOutHtml.close();

}

 

// convert the html webpage format to xml format

try {

    ….

}

catch { …. }

finally {

       System.out.flush();

       fwOutXml.flush();

       fwOutXml.close();

       pwOutXml.flush();

       pwOutXml.close();

}

 

Below is a short listing of the new XML file:

 

  <?xml version="1.0" encoding="iso-8859-1" ?>

- <<html>

- <<head>

  <<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

  <<meta name="keywords" content="California, cities, towns, villages, list, zipcodes, postal codes, united states, ca" />

  <<meta name="description" content="Cities, towns and suburbs in California, United States (CA) starting with A" />

  <<title>Cities and Towns in California starting with A – ABC Company</title>

  <<link rel="stylesheet" href="http://www.abc.com/style.css" type="text/css" media="screen" />

  </head>

- <<body>

  <<a name="top" />

- <<div id="container">

- <<div id="header">

  <<div id="postmark" />

- <<a href="http://www.abc.com/" class="imglink">

  <<img id="logoimg" src="http://www.abc.com/images/zipcodes.gif" width="192" height="33" alt="Zipcodes America Logo" />

  </a>

  <<hr />

  </div>

- <<div id="nav">

- <<ul>

- <<li>

  <<a href="http://www.abc.com/" title="Home Page">Home</a>

  </li>

- <<li>

  <<strong>Search</strong>

  (zipcode or suburb)

- <<div class="hide">

  <<form method="post" action="http://www.abc.com/search" />        // line 23

  </div>

  <<input type="text" name="q" class="searchbox" alt="Search query" />

  <<br />

  <<input type="submit" value="find!" class="searchbutton" alt="Perform search" />

  <<div class="hide" />

  </li>

 

What I find it interesting is that it is possible to parse the above XML file with the same parseData() from another class without any problem. As a result, I have come to the following conclusion so far:

( i ) There is some file locking that is prevent saxBuilder from parsing the XML file at the time.

( ii ) The light_html2xml does not appears to have correctly converted over the orginal Html to Xml but some how it has been picked up by the parser in the same class, but not by the same parser from another class.

( iii ) I would like to use another conversion tool such as Tagsoup in place of light_html2xml to determine where the cause of this issue is coming from. As a result, would anyone be able to assist me coming up with a few lines of conversion statements using Tagsoup since I am not familiar with using this tool?

( iv ) light_html2xml is good as it strip out all namespace, DTD, Entity Resolver, etc and only return what I need. JTidy does correct conversion but include namespace, DTD, Entity Resolver which makes parsing difficulty.

 

Many thanks again,

 

Jack   



From: Sheila M. Morrissey <Sheila.Morrissey@portico.org>
To: Jack Bush <netbeansfan@yahoo.com.au>
Sent: Wednesday, 29 October, 2008 12:52:06 AM
Subject: RE: [xml-dev] Cannot close an XML file used for parsing

Jack – did you try fosOutHtml.getFD().sync() after the flush?

Regards

Sheila

 


From: Jack Bush [mailto:netbeansfan@yahoo.com.au]
Sent: Tuesday, October 28, 2008 8:41 AM
To: Robert Koberg
Cc: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Cannot close an XML file used for parsing

 

Hi Robert,

 

Thanks for responding to this post.

 

I have added your suggestion but the issue still persist. Nevertheless, I do believe that this is caused by the new XML file not having been closed properly.

 

There is no problem with light-html2xml method which has worked in the past.

 

Any more suggestion to try out?

 

Thanks,

 

Jack

 


From: Robert Koberg <rob@koberg.com>
To: Jack Bush <netbeansfan@yahoo.com.au>
Cc: xml-dev@lists.xml.org
Sent: Tuesday, 28 October, 2008 9:42:21 AM
Subject: Re: [xml-dev] Cannot close an XML file used for parsing


close the stream or reader in a finally block to avoid leaving it open 
if an error occurs.

try{
}catch(....){
}finally {
}

On Oct 27, 2008, at 6:03 PM, Jack Bush wrote:

> Hi All,
>
> I appears to have difficulty closing (possibly flushing it first) an 
> XML file that was subsequently being parsed without success. The 
> error generated is:
>
> org.jdom.input.JDOMParseException: Error on line 23: The element 
> type "form" must be terminated by the matching end-tag "</form>".
>
> Below is the code snippets of readData() to retrieve (HTML) data 
> from a website, save it to a file, then convert to XML format before 
> returning the new filename:
> public String readData() {
>
>    try {
>          URL url  = new URL("http://www.abc.com");
>          URLConnection connection = url.openConnection();
>          InputStream isInHtml = url.openStream();  // throws an 
> IOException
>          disInHtml = new DataInputStream(new 
> BufferedInputStream(isInHtml));
>          System.out.flush();
>          FileOutputStream fosOutHtml = null;
>          fosOutHtml = new FileOutputStream("C:\\Temp\\ABC.html");
>          int oneChar, count=0;
>          while ((oneChar=disInHtml.read()) != -1)
>              fosOutHtml.write(oneChar);
>          isInHtml.close();
>          disInHtml.close();
>          fosOutHtml.flush();    // optional
>          fosOutHtml..close();
>          .....
>    }
>
>    try {
>          File fileInHtml = new File("C:\\Temp\\ABC.html");
>          FileReader frInHtml = new FileReader(fileInHtml);
>          BufferedReader brInHtml = new BufferedReader(frInHtml);
>          String string = "";
>          while (brInHtml.ready())
>              string += brInHtml.readLine() + "\n";
>          fwOutXml  = new FileWriter("C:\\Temp\\ABC.xml");
>          pwOutXml  = new PrintWriter(fwOutXml);
>          light_html2xml html2xml = new light_html2xml();
>          pwOutXml.print(html2xml.Html2Xml(string));
>          system.out.flush()    // optional
>          fwOutXml.flush();      // optional
>          fwOutXml.close();
>          pwOutXml.flush();      // optional
>          pwOutXml.close();
>          return fileInHtml.getAbsolutePath();
>          ....
>    }
> }
>
> // parseData reads the XML file using the name returned by readData()
> public void parseData(String XMLFilename)
> {
>    try
>    {
>        FileReader frInXml = new FileReader(FileName);
>        BufferedReader brInXml = new BufferedReader(frInXml);
>        SAXBuilder saxBuilder = new 
> SAXBuilder("org.apache.xerces.parsers.SAXParser"); // 
> JDOMParseException generated.
>        ....
> }
> These codes would worked when they were in a single method but I 
> have since placed some structure around them using a number methods.
>
> This issue has risen in th past where I have been able to close the 
> XML file prior to reading them again. However, I don't have a 
> solution for it this time round.
>
> I am running JDK 1.6.0_10, Netbeans 6.1, JDOM 1.1 on Windows XP 
> platform.
>
> Any assistance would be appreciated..
>
> Many thanks,
>
> Jack
>
> Make the switch to the world's best email. Get Yahoo!7 Mail.


_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

 


Search 1000's of available singles in your area at the new Yahoo!7 Dating. Get Started.



Search 1000's of available singles in your area at the new Yahoo!7 Dating. Get Started.

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS