Lists Home |
Date Index |
- From: Chris Hubick <firstname.lastname@example.org>
- To: Xml-Dev <email@example.com>
- Date: Wed, 03 Dec 1997 18:11:19 -0700
I am writing a recursive descent XML parser in Java and have
a couple questions....
The XML Working Draft dated 17-November-1997 states:
 prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?
 Misc ::= Comment | PI | S
 PI ::= '<?' Name (S (Char* - (Char* '?>' Char*)))? '?>'
 XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
 EncodingPI ::= '<?xml' S 'encoding' Eq QEncoding S? '?>'
Within a PI is the Name "xml" reserved? If it is, should
there not be a [wfc] on PI stating so?
By the current definition any XMLDecl and EncodingPI is also
a valid PI. In a prolog an XMLDecl is optional, and is followed
by Misc, which includes PI.
Ok, so I have can have an XML file with no XMLDecl
(it's optional) followed by "<?xml version="blah" encoding=5?>" which
matches PI, in my Misc*. And this is legal? My parser will
take this just fine as such, but I wonder about the others.
It makes detecting a bad XMLDecl impossible! My parser will just
say fine, that wasn't an XMLDecl, and feed it to Misc, which will
most likely match (or possibly spew) it as a PI.
Shouldn't  PI have an S? at the end before '?>' ?
Also shouldnt PCData be:
 PCData ::= [^<&]+
rather than the current:
 PCData ::= [^<&]*
 content ::= (element | PCData | Reference | CDSect | PI | Comment)*
<TEST>This is a test</TEST>
In my recursive descent parses to:
<PCData>This is a test</PCData>
And we get infinite matches on a zero length PCData.
xml-dev: A list for W3C XML Developers. To post, mailto:firstname.lastname@example.org
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:email@example.com the following message;
To subscribe to the digests, mailto:firstname.lastname@example.org the following message;
List coordinator, Henry Rzepa (mailto:email@example.com)