xml-dev - Re: Partial XML Processors (was Re: JavaScript parser update and Questio

Re: Partial XML Processors (was Re: JavaScript parser update and Questio

[ Lists Home | Date Index | Thread Index ]

From: David Megginson <ak117@freenet.carleton.ca>
To: <xml-dev@ic.ac.uk>
Date: Sat, 17 Jan 1998 07:19:58 -0500

Jeremie Miller writes:

 > A well-formed XML document is not required to have a DTD, internal or
 > external, correct?  Is a well-formed parser not an XML parser that does not
 > have access to or does not process a DTD, internal or external?  I guess I
 > haven't found a clear definition of what a well-formed parser is yet.

The PR is not very clear about processing requirements (other than
error reporting and a few details like ignorable whitespace).  As I
understand things, however, a well-formed parser must be able to do
the following:

1) Parse all of the grammar, including the document type declaration
  and internal DTD subset, without throwing spurious errors (even if
  it does nothing with the declarations).
2) Act correctly on the rmd parameter of the xml declaration.
3) Report a large range of errors, such as "]]>" in character data,
  "<" in an attribute value literal, illegal characters in element and
  attribute names, mismatched start- and end-tags, etc.

There is no provision for a conforming XML parser that does not do
full error reporting, even if the parser correctly handles all XML
constructions.  For example, AElfred parses a DTD, resolves all
general and parameter entities, stores information on entities and
notations, fills in defaulted attribute values, marks ignorable
whitespace, and supports multiple character encodings, but it is a
non-conforming XML parser because it does not report all required
well-formedness errors.

 > If this is true, then a well-formed parser doesn't even have to acknowledge
 > that entities exist except for the built in ones, and absolutely all
 > whitespace is preserved, right?

Yes, that is my understanding, except that the well-formed parser must
check that the entity reference itself is well-formed.  For example,
if you found

  &1front2;

you would be required to report a well-formedness error.  You have to
be prepared to check the whole range of Unicode characters, not just
the first 256 (see the PR for what's allowed at the start and middle
of a name).  AElfred does not do this right now, because it would make
the parser too large for use in applets (I added the support
experimentally once, then removed it again).

All the best,

David

-- 
David Megginson                 ak117@freenet.carleton.ca
Microstar Software Ltd.         dmeggins@microstar.com
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

References:
- Re: Partial XML Processors (was Re: JavaScript parser update and Questions)
  - From: "Jeremie Miller" <jeremie@netins.net>

Prev by Date: Re: Character Encoding and the XML PR (was Re: PR.xml)
Next by Date: AElfred and External Entities (was Re: Partial XML Processors) and Questions)
Previous by thread: Re: Partial XML Processors (was Re: JavaScript parser update and Questions)
Next by thread: Re: Partial XML Processors (was Re: JavaScript parser update and Questions)
Index(es):
- Date
- Thread