[
Lists Home |
Date Index |
Thread Index
]
- From: "Simon St.Laurent" <simonstl@simonstl.com>
- To: xml-dev@lists.xml.org
- Date: Fri, 10 Nov 2000 09:58:10 -0500
Since I'm once again writing a chapter on interoperability issues among XML
parsers, I'm pondering the division between validating and non-validating
parsers and how that differs from the division between valid and
well-formed documents.
(The gist of the chapter is in slides at:
http://www.simonstl.com/articles/interop/)
It seems like the option for non-validating parsers to ignore external DTD
subsets and entities came into the spec pretty late, so at some point it
might have made sense for all parsers, validating or non-validating, to be
able to understand the contents of the DOCTYPE declaration. Before that
option appeared, document creators could count on both validating and
non-validating parsers to return the same information from a document.
This reasonably justified the requirement that non-validating parsers
should 'speak' DTD, even the tricky parts.
Once non-validating parsers were freed of that option, validating and
non-validating parsers could return different results from the same
document, but it's not even that consistent. Some non-validating parsers
do read the external subset, etc. Developers are forced to look to finer
and more obscure criteria than the main divide between validating and
non-validating parsers, and users confronted with missing information in
applications are bound to be confused.
(The standalone declaration can only be used to identify documents which
don't require external resources, not document which do require external
resources, and is widely underused in any event. There's no trigger in XML
for warning document consuming applications that they'd better have a
parser which retrieves external resources.)
At this point, I have a hard time accepting the line drawn between
validating and non-validating parsers, or the justification for making all
non-validating parsers understand and process whatever DTDs they happen to
encounter. It seems it would have been wiser to make non-validating
parsers behave consistently, either by always reading all of the DTD
content or by ignoring it entirely. I spent a long time preferring the
first option, but at this point I'm leaning toward the second.
As fond as I have been of DTDs (believe it or not), I think it's well past
time to extract them from the initial parsing process, and make them a
post-processing tool, something like schemas. The document contains
whatever it contains, and DTD or schema processing is considered an
addition to the document, not content at the same level as the actual
document content.
This is tough stuff to deal with, and I don't see it changing any time
soon, but I'd like to suggest that we at least consider why the lines are
drawn as they are and consider alternatives that might produce more
comprehensible results.
Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
XHTML: Migrating Toward XML
http://www.simonstl.com - XML essays and books
|