OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] RE: XML parsers use what computational power?

* Liam R E Quin wrote:
>On Wed, 2013-04-10 at 09:43 +0000, Costello, Roger L. wrote:
>> Resolving these entities:
>> <!ENTITY ha1 "ha">
>> <!ENTITY ha2 "&ha1;&ha1;"> 
>> ...
>> <!ENTITY ha128 "&ha127;&ha127;">
>> requires an amount of memory that is exponential to the number of entities.
>> Therefore an XML parser requires more computational power than a linear bounded automata [1].
>That does not follow at all.

The problem here is that an "XML parser" is not required to actually
produce any output on its own, it is assumed that there is some higher
level application it is interacting with. The problem above occurs if,
for instance, the machine writes out the input document with all the
entities expanded onto its "tape", in other words, if it has to "call"
some function of the higher level application and pass in the value of
an attribute with all entities expanded as a "type position" and then
the higher level application takes over.

That is a common API, and it seems that this scenario does have the
problem above, the "parser" is in need of more tape than what can be
described as a linear function in terms of the input length, but that's
an implementation choice; it would be fine if the "parser" component
reports only small bits to the higher level application. Generally, it
may be best to study the "complexity" of XML in terms of what is needed
to determine whether an input string is a "well-formed XML document", 
under the assumption that the input string is encoded using a character
encoding supported by the XML processor and the encoding isn't any more
complex than the processor itself.

>My understanding is that XML is at worst LL(2), not LL(n), in parser
>complexity. So it can be parsed by a deterministic finite state
>automaton, as described e.g. in the Dragon Book, the standard text for
>such things (Aho, Hopcraft & Ullman). The 2 is because you may need two
>tokens of look-ahead before you can reduce, in a shift-reduce parser,
>although that's a supposition on my part.

The EBNF grammar in the XML 1.0 Recommendation describes a context-free
language, but the set of all XML 1.0 documents is not context-free, and
since LL parsers recognize only a subset of the context-free languages,
they cannot decide whether any given string is an XML document. Reasons
for XML being not context free include many of the "WFCs" in the speci-
fication (not all of them, e.g. numeric character references have to
refer to a `Char`, and that is a regular problem not encoded in the EBNF
grammar for reasons of readability among others); one example is that
`<x a='' a='' />` is not an XML document, and you cannot express that
attribute names must be unique in a context-free grammar, so XML is not

A simple way to look at this is thinking about how to make a generator
for random well-formed XML 1.0 documents. If XML was context-free, you
could write a grammar for XML 1.0 documents of the form

  Nx ::= (Nx or Tx)*

where `Nx` is any non-terminal symbol (like `document` in the EBNF) and
`Tx` is essentially a Unicode character, and `*` means "zero or more".
You would construct a random document from that by picking `document` as
start symbol, and recursively replace the right side by a random choice
among the options given (for `or` you take one or the other, for `*` you
might take zero or one or more). What you have already generated never
factors into that, so you will end up generating duplicate attributes.
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS