OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: XML Torture Test: Parsers Fail

[ Lists Home | Date Index | Thread Index ]
  • From: "Richard L. Goerwitz" <richard@goon.stg.brown.edu>
  • To: David Megginson <david@megginson.com>
  • Date: Wed, 07 Apr 1999 12:30:30 -0400

David Megginson wrote:

>  > I don't see anything in the spec that says "don't read and validate
>  > external parsed entities if they're not used."  And in fact, the spec
>  > seems to say that, in order to be valid, they must (whether used or not)
>  > match certain productions in the grammar.
> You could check them for well-formedness (I guess), but you could not
> validate them out of context

I sympathize with this view.  But your making an implicit apology on behalf
of the spec, which actually just says:

    1) The document entity is well-formed if it matches the production
       labeled document.

    2) An external general parsed entity is well-formed if it matches
       the production labeled extParsedEnt.

    3) An external parameter entity is well-formed if it matches the
       production labeled extPE

There's no mincing words about "using" entities (in the sense of adding
an entity reference to a spot where the reference will expand).

All the spec says is that validity depends on entities matching certain
productions in the grammar.  It's a simple, static definition of how all
the entities must be structured.  It says nothing about operational ques-
tions like whether you have to wait to validate until the entity appears
in a place where it will be expanded.

> You could check them for well-formedness (I guess), but you could not
> validate them out of context                                      ^^^

Sure you could.  But obviously an external entity, in this scenario,
would come out invalid if you declared it at a point where parameter
entities it uses are not yet declared.  So just make sure you do that.
It's what the spec says, right? ;-)

Also, parsers, when they check external entities, will have to make
temporary copies of their parents' entity tables.  Why?  Because any
given external entities may define more entities that it itself uses.
(A typical case would be defining a parameter entity that later gets ex-
panded to "INCLUDE").  So we have to keep a record of what's been de-
fined.  On the other hand, if the parent entity never references the
external entity, we don't want definitions within the external entity
leaking into the parent's tables.  An exception to this is the top-
level external DTD entity, which is always "used" and whose definitions
we always want to leak back into the parent's tables.

If IE's parser interprets the spec the way it's written, it will have
to do all of these things.

I reiterate my belief that the XML standard was written with SGML prac-
tice in mind.  If you know what SGML parsers typically do in such situ-
ations, you know immediately what the XML spec editors really meant to
say.  The question of whether what they actually _did_ say will work
in practice is another matter.

STG's parser, by the way, compromises between these two approaches.
On the one hand, it does not insist that external entities validate at
the point where they are declared.  On the other hand, it still scans
the entities, whether they are used or not, and emits error messages if
it finds any obvious problems.  This seems a reasonable approach.  I'd
guess (not having tested it myself) that it's what IE is doing as well.


Richard Goerwitz
PGP key fingerprint:    C1 3E F4 23 7C 33 51 8D  3B 88 53 57 56 0D 38 A0
For more info (mail, phone, fax no.):  finger richard@goon.stg.brown.edu

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS