OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   SAX compatibility & WF parsing questions

[ Lists Home | Date Index | Thread Index ]
  • From: David Megginson <david@megginson.com>
  • To: Juergen Modre <jmodre@edu.uni-klu.ac.at>
  • Date: Mon, 8 Jun 1998 07:32:17 -0400

Juergen Modre writes:

 > Here are few questions to the SAX interface and XML parsing which
 > arised when I implemented the SAX interface & compared the already
 > existing implementations.
 > 
 > Thanks for any hints.
 > And forgive me if something is sun-clear for everybody except me.
 > 
 > 1.) Parser.java
 > Parser.java has the following javadoc header:
 >   * <p>All SAX parsers must also implement a zero-argument constructor
 >   * (though other constructors are also allowed).</p>
 > What does this mean for this case?

That means that if you are creating a class SAXDriver, you need a

  public SAXDriver ()

constructor.

 > 2.) SAX callback events
 > For which parts of an XML document should a SAX
 > compatible parser give SAX callbacks?
 > 
 > Looking at the XML start production
 > [1] document ::= prolog element Misc*
 > 
 > a) only from root-element to end of root element (= element production)
 > b) from root-element to end of XML file (= element and Misc* production)
 > c) the whole XML file (whole document production)
 > 
 > Well, at least always the DTDHandler.notationDecl() and
 > DTDHandler.unparsedEntityDecl() methods must be called
 > always outside the element production, but which one
 > is the correct way?

You should report all processing instructions, notation declarations,
and unparsed entity declarations (or at least, all of them that your
parser can get to).  The DocumentHandler.ignorableWhitespace()
callback is only for ignorable whitespace in element content, so you
should not use it for whitespace outside of the document element.

 > 3.) Return value of systemId and publicId
 > In the SAX documentation there is often the <p>
 >   * <p>If the system identifier is a URL, the SAX parser must
 >   * resolve it fully before reporting it to the application.</p>
 > 
 > Does a SAX conformant parser now need to return always the
 > "absolute URI" for the parameters systemId and publicId?
 > e.g.
 > If defined:
 >   <!NOTATION BMP SYSTEM "abc.exe">
 > The SAX parser must for instance return:
 >   <!NOTATION BMP SYSTEM "file:/C:/Files/XML-Files/abc.exe">
 > 
 > Is this the meaning of this <p>?

That's a very good question -- it should certainly do so for other
system identifiers, but nobody's really sure what the system
identifier for a notation is supposed to be in the first place.  Any
suggestions?

 > 4.) WF parsing and: characters vs. ignorableWhitespace
 > Looking at the XML start production
 > [1] document ::= prolog element Misc*
 > 
 > For prolog and Misc a parser should always return
 > ignorableWhitespace.

No, it should not use the callback at all outside of the document
element; note the first line of the JavaDoc comment:

* Receive notification of ignorable whitespace in element content.
                                               ^^^^^^^^^^^^^^^^^^

 > For the parts in the element production and WF parsing:
 > 
 > a.) always charData
 > b.) always ignorableWhitespace
 > c.) or must be DTD aware, which means charData or
 > ignorableWhitespace according to the DTD

If the parser is validating, it must distinguish it; if it is
non-validating but DTD-aware, it may distinguish it; and if it is not
DTD-aware, it cannot distinguish it.  Tricky stuff, really (thanks to
the XML 1.0 spec).

 > 5.) ByteStreamDemo.java
 > When launching this example it gives a false usage hint:
 > Usage: java -Dorg.xml.sax.parser=<classname> SystemIdDemo <document>
 > should be
 > Usage: java -Dorg.xml.sax.parser=<classname> ByteStreamDemo <document>

Right -- thanks for the correction.

 > 6.) EntityResolver.java
 > I know that SAX 1.0 is finalized now but I think the name "resolveExternalEntity"
 > would be better in this case than "resolveEntity" :-).

It probably would have been.  Oh well.


All the best, and thanks for the feedback,


David

-- 
David Megginson                 david@megginson.com
           http://www.megginson.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS