Lists Home |
Date Index |
- From: David Megginson <firstname.lastname@example.org>
- To: <email@example.com>
- Date: Sat, 17 Jan 1998 07:19:58 -0500
Jeremie Miller writes:
> A well-formed XML document is not required to have a DTD, internal or
> external, correct? Is a well-formed parser not an XML parser that does not
> have access to or does not process a DTD, internal or external? I guess I
> haven't found a clear definition of what a well-formed parser is yet.
The PR is not very clear about processing requirements (other than
error reporting and a few details like ignorable whitespace). As I
understand things, however, a well-formed parser must be able to do
1) Parse all of the grammar, including the document type declaration
and internal DTD subset, without throwing spurious errors (even if
it does nothing with the declarations).
2) Act correctly on the rmd parameter of the xml declaration.
3) Report a large range of errors, such as "]]>" in character data,
"<" in an attribute value literal, illegal characters in element and
attribute names, mismatched start- and end-tags, etc.
There is no provision for a conforming XML parser that does not do
full error reporting, even if the parser correctly handles all XML
constructions. For example, AElfred parses a DTD, resolves all
general and parameter entities, stores information on entities and
notations, fills in defaulted attribute values, marks ignorable
whitespace, and supports multiple character encodings, but it is a
non-conforming XML parser because it does not report all required
> If this is true, then a well-formed parser doesn't even have to acknowledge
> that entities exist except for the built in ones, and absolutely all
> whitespace is preserved, right?
Yes, that is my understanding, except that the well-formed parser must
check that the entity reference itself is well-formed. For example,
if you found
you would be required to report a well-formedness error. You have to
be prepared to check the whole range of Unicode characters, not just
the first 256 (see the PR for what's allowed at the start and middle
of a name). AElfred does not do this right now, because it would make
the parser too large for use in applets (I added the support
experimentally once, then removed it again).
All the best,
David Megginson firstname.lastname@example.org
Microstar Software Ltd. email@example.com
xml-dev: A list for W3C XML Developers. To post, mailto:firstname.lastname@example.org
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:email@example.com the following message;
To subscribe to the digests, mailto:firstname.lastname@example.org the following message;
List coordinator, Henry Rzepa (mailto:email@example.com)