Lists Home |
Date Index |
- From: David Brownell <email@example.com>
- To: Tim Bray <firstname.lastname@example.org>
- Date: Mon, 28 Jun 1999 20:17:17 -0700
> > But they can become a LOT
> >smaller if they don't need to handle even that, and are relieved of
> >the responsibilities to handle the syntax and state in a DOCTYPE.
> I disagree. We went through this quite a bit in the XML Syntax Working
> Group. It is absolutely *not* the case that DTD parsing is demonstrably
> very expensive. There was a conventional wisdom floating about that
> a parser for a DTD-free dialect of XML could deliver the same performance
> and functionality in immensely less space. Empirical analysis fails
> to support this contention.
Some different empirical analysis says it's not so far off as that.
Of course, "immense" is a loaded word.
> Analysis of existing parsers shows immense
> amounts of work going into things like reading Unicode efficiently,
Admittedly reading Unicode "efficiently" is something folk spend time
on, but at least in Java that's mostly to benchmark better. Handling
different character encodings is for system libraries to handle, it's
not specific to XML.
(Yes, the internals of java.io.Reader leave lots to be desired; if that
were open source, major performance fixes would have happened ages ago!)
> doing well-formedness checks on entity nesting,
Well, the subset I described had no need for such well-formedness
checks, or the other overhead for entities.
> and tracking locations to support good error messages
That one never seemed to me to involve much work; but when there's
really only one entity ("the document") then this is a lot simpler
to be tracking/reporting.
The error-related stuff that affects space and performance relates
to detecting and handling errors ... which is basically a linear
function of how many errors are possible. Remove those error cases
and you remove the code to deal with them.
> I repeat that there is a resounding lack
> of evidence to show that parsing DTD syntax is particularly taxing for
> any competent programmer. Even parameter entities aren't hard to
> implement - they are hard to *describe*, just not hard to implement. -Tim
I get the feeling you're missing my point, which wasn't exclusively
about DTD syntax. (Though a quick count did show that removing the
<!DOCTYPE ...> and everything it implies removes something like 4/9
of the grammar productions.)
My earlier post listed several other ways a DTD-less subset cuts costs.
xml-dev: A list for W3C XML Developers. To post, mailto:email@example.com
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:firstname.lastname@example.org the following message;
To subscribe to the digests, mailto:email@example.com the following message;
List coordinator, Henry Rzepa (mailto:firstname.lastname@example.org)