RE: [xml-dev] How to parse XML document with default namespace with JDOM

Hi Michael,
Thanks for responding to this question.
I have not had any luck with jdom-interest@jdom.org forum at all since subscribing to them a few months back.
In the meantime, can you confirm that it is not possible to use Sax 6.5.x with JDOM according to http://www.cafeconleche.org/books/xmljava/chapters/ch16s05.html? Or is it because you are not familiar with JDOM?
Could anyone point me to a more useful JDOM forum to assistance with this question?
Many thanks,
Jack
  
  From: Michael Kay 
  <mike@saxonica.com>
To: 
  Jack Bush <netbeansfan@yahoo.com.au>; xml-dev@lists.xml.org
Sent: Wednesday, 5 November, 2008 
  12:39:48 AM
Subject: RE: 
  [xml-dev] How to parse XML document with default namespace with JDOM 
  XPath

I see no Saxon code here. You are using the XPath engine 
  that comes with JDOM. You might be better off asking on the JDOM list. I have 
  to confess I'm surprised to see you declaring namespaces AFTER compiling the 
  XPath expression, but I can't say I'm familiar with this 
  API.
 
Michael Kay
http://www.saxonica.com/


    
    From: Jack Bush 
    [mailto:netbeansfan@yahoo.com.au] 
Sent: 04 November 2008 
    13:02
To: xml-dev@lists.xml.org
Subject: [xml-dev] How 
    to parse XML document with default namespace with JDOM 
    XPath

Hi All,
 
I am having difficulty parsing 
    using Saxon and TagSoup parser on a namespace html document. The relevant 
    content of this document are as follows:
 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 
    Transitional//EN" 
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html 
    xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" 
    content="text/html; charset=UTF-8" />
……..
</head>
<body>
    <div 
    id="container">
        
    <div id="content">
            
    <table class="sresults">
    
                <tr>
                
        <td>
                    
        <a 
    href="http://www.abc.com/areas" title=" Hollywood , CA "> hollywood 
    </a>
                
        </td>
                
        <td>
                    
        <a 
    href="http://www.abc.com/areas" title=" San Jose , CA "> san jose 
    </a>
                    
    </td>
                    
    <td>
                        
    <a href="http://www.abc.com/areas" title=" San Francisco , CA 
    "> san francisco </a>
                    
    </td>
                    
    <td>
                        
    <a href="http://www.abc.com/areas" title=" San Diego , CA "> 
    San diego </a>
                    
    </td>
              
    </tr>
……….
</body>
</html>
 
Below is the relevant code snippets illustrates how I 
    have attempted to retrieve the contents (value of  
    <a>):
 
       
          import 
    java.util.*;
             
    import org.jdom.*;
             
    import org.jdom.xpath.*;
             
    import org.saxpath.*;
             
    import org.ccil.cowan.tagsoup.Parser;
 
( 1 )     
      frInHtml = new 
    FileReader("C:\\Tmp\\ABC.html");
( 2 )    
       brInHtml = new 
    BufferedReader(frInHtml);
( 3 ) //    SAXBuilder 
    saxBuilder = new 
    SAXBuilder("org.apache.xerces.parsers.SAXParser");
( 4 )       
    SAXBuilder saxBuilder = new 
    SAXBuilder("org.ccil.cowan.tagsoup.Parser");
( 5 )       
    org.jdom.Document jdomDocument = 
    saxbuilder.build(brInHtml);
( 6 )       
    XPath xpath = 
     XPath.newInstance("/ns:html/ns:body/ns:div[@id='container']/ns:div[@id='content']/ns:table[@class='sresults']/ns:tr/ns:td/ns:a");
( 7 )       
    xpath.addNamespace("ns", "http://www.w3.org/1999/xhtml");
( 8 )       
    java.util.List list = (java.util.List) 
    (xpath.selectNodes(jdomDocument));
( 9 )       
    Iterator iterator = list.iterator();
( 10 )     while 
    (iterator.hasNext())
( 11 )    
     {
( 12 
    )            
    Object object = iterator.next();
( 13 ) //      
       if (object instanceof 
    Element)
( 14 ) 
    //               
    System.out.println(((Element)object).getTextNormalize());
( 15 
    )             
    if (object instanceof Content)
( 16 
    )                   
    System.out.println(((Content)object).getValue());
              
    }
….
 
This program would work on the same document without 
    the default namespace, hence, it would not be necessary to include “ns” 
    prefix along in the XPath statements (line 6-7) either. Moreover, I was 
    using “org..apache.xerces.parsers.SAXParser” to have successfully retrieve 
    content of <a> from the same document without default namespace in the 
    past.
 
I would like to achieve the following objectives if 
    possible:
 
( i ) Exclude DTD and namespace in order to 
    simplifying the parsing process. How this could be done?
( ii ) If this is not possible, how to include it in 
    XPath statements (line 6-7) so that the value of <a> is picked up 
    correctly?
( iii ) Would changing from 
    “org.apache.xerces.parsers.SAXParser” to “org.ccil.cowan.tagsoup.Parser” 
    make any difference as far as using XPath is concerned?
( iv ) Failing to exlude DTD, how to change the 
    lookup of a PUBLIC DTD to a local SYSTEM one and include a local DTD for 
    reference?
 
I am running JDK 1.6.0_06, Netbeans 6.1, JDOM 1.1, 
    Saxon6-5-5, Tagsoup 1.2 on Windows XP platform.
 
Any assistance would be appreciated.
 
Thanks in advance,
 
Jack


    Search 1000's of available singles in your area at the new Yahoo!7 Dating. 
    Get Started.