OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: XML Grammar kind

[ Lists Home | Date Index | Thread Index ]
  • From: Rob Lugt <roblugt@elcel.com>
  • To: xml-dev@lists.xml.org
  • Date: Thu, 14 Dec 2000 15:00:27 +0000

David Lacerte wrote:
> Hi!  I was wondering at what category of grammar the xml's belongs.  Does
it have the properties of a LALR(1) grammar, which is the most probable?  I
need that information in order to do a parser.  Thanks!!

If you look closely at the XML grammar productions in the recommendation,
you will see that in most cases only one 'token' of look-ahead is required.
There are some exceptions, but they are quite easily re-grouped to satisfy

Exceptions known to me:-

[39] element ::= EmptyElemTag | Stag content ETag
[40] STag ::= '<' Name (S Attribute)* S? '>'
[42] ETag ::= '</' Name S? '>'
[44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>'
When parsing element, you will not know if you are parsing an EmptyElemTag
or a STag until the parser comes across a '>' or '/>'.  In this case the
production can be re-written in LALR(1) as:
element ::= '<' Name (S Attribute)* S? ('/>' | ('>' content ETag))

[82] NotationDecl ::= '<!NOTATION' S Name S (ExternalID | PublicID) S? '>'
[75] ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S
[83] PublicID ::= 'PUBLIC' S PubidLiteral
Here, the parser would not know if it was parsing an ExternalID or a
PublicID until it found a SystemLiteral or '>'.
This could be re-written as LALR(1) :
NotationDecl ::= '<!NOTATION' S Name S NotationID S? '>'
NotationID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral (S

However, if you are considering using standard parsing tools, I think some
of the biggest problems you will have are:-
1) Determining what is a token.  The XML productions go too close to the
metal to be considered tokens in the normal sense
2) Entity replacement - especially parameter entities
3) Conditional Sections

Rob Lugt
ElCel Technology


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS