XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] RE: XML parsers use what computational power?

On Wed, 2013-04-10 at 09:43 +0000, Costello, Roger L. wrote:
> Hi Folks,
> 
> Resolving these entities:
> 
> <!ENTITY ha1 "ha">
> <!ENTITY ha2 "&ha1;&ha1;"> 
> ...
> <!ENTITY ha128 "&ha127;&ha127;">
> 
> requires an amount of memory that is exponential to the number of entities.
> 
> Therefore an XML parser requires more computational power than a linear bounded automata [1].

That does not follow at all.

Note also that any language with string concatenation has the same issue
- e.g. JavaaScript:

var boy = "Simon";
var twoboys = boy + boy;
var four = towboys + twoboys;
var eight .... and so on

But the algorithmic complexity of a parser is not dependent in any way
on the quantity of input.

If a parser is linear with respect to input tokens, that means that if
you double the amount of input then processing takes roughly twice as
long, but this is still linear.

>  Furthermore:
>    
>     Linear bounded automata are acceptors for 
>     the class of context-sensitive languages. [1]
> 
> Therefore XML is more powerful than a context-sensitive language.

I think your "therefore" is a bit iffy here.
if p implies q, and not(p), we cannot say, therefore not q.

My understanding is that XML is at worst LL(2), not LL(n), in parser
complexity. So it can be parsed by a deterministic finite state
automaton, as described e.g. in the Dragon Book, the standard text for
such things (Aho, Hopcraft & Ullman). The 2 is because you may need two
tokens of look-ahead before you can reduce, in a shift-reduce parser,
although that's a supposition on my part.

Please, let's not spread fear-uncertainty-doubt around without good
cause. The "billion laughs" attack you mentioned was a nonsense paraded
triumphantly by people who hated XML, when the languages they preferred
(SGML and JavaScript) were susceptible to the same attack and had the
same simple fix:
  if (this would use too much memory) {
     don't do it.
  }

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS