OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   XML Torture Test: Parsers Fail

[ Lists Home | Date Index | Thread Index ]
  • From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
  • To: xml-dev@ic.ac.uk
  • Date: Mon, 5 Apr 1999 10:06:39 -0500

Without intending to do so, I have devised an XML document that exposes
many problems in almost all XML validating parsers and non-validating
parsers that resolve external entity references.  You will find this
torture test at

http://metalab.unc.edu/examples/players/index.xml

It has broken every parser I've thrown at it in one way or another
including the one in IE5  with the single exception of RXP.  However RXP
reports some warnings that do not appear to be errors, and missed some
problems involving the lack of encoding declarations in the text
declarations in an earlier version that xml4j 2.0.4 (but not 1.1.14) picked
up. These have now been fixed.

As best I can tell this document is both well-formed and valid. It's hard
to say for sure when many different parsers all fail to process it, mostly
after either giving up completely or generating incorrect error messages.
Until I'm more confident the document is correct, I'm simply defining a
broken parser as one that

1. describes a valid documbent as invalid  (Microsoft?, xml4j?)
2. describes an invalid document as valid (RXP)
3. describes an invalid document as invalid but gives the wrong reason.
(Microsoft?, xml4j?)

Once I've conclusively determined whether my document is valid, I should be
able to determine whether Microsoft, xml4j and xml4j fit into category 1 or
3 or both.

What's torturous about this example is that it defines over 1000 separate
external general  entity references in several dozen different DTDs.
Currently only one of those entities is actually used in the main document,
but I plan to expand it to use all 1000+ entities.  Thus it's likely to
become even more difficult to parse properly.  Leaving aside the question
of whether this is the proper design for this document, it's nonetheless
the case that parsers should be able to handle it.  Parser authors may wish
to investigate further. The assistance of anyone who can spot by eye
mistakes I made that the parsers may be incorrectly reporting is
appreciated.



+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|        XML: Extensible Markup Language (IDG Books 1998)            |
|   http://www.amazon.com/exec/obidos/ISBN=0764531999/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://sunsite.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://sunsite.unc.edu/xml/     |
+----------------------------------+---------------------------------+



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS