[
Lists Home |
Date Index |
Thread Index
]
- From: David Brownell <david-b@pacbell.net>
- To: Tim Bray <tbray@textuality.com>
- Date: Mon, 28 Jun 1999 16:27:59 -0700
Tim Bray wrote:
>
> At 01:57 PM 6/28/99 -0700, David Brownell wrote:
> >... what do folk think of using the following XML subset:
> >
> > Everything in XML, except the <!DOCTYPE ...> support
> > which takes up something like 2/3 of most parsers.
>
> Distinguish between <!DOCTYPE > and validation. I do *not* agree
> that parsing DTD syntax takes up 2/3 of a parser.
I did distinguish between them. In part that's why I described
this as a (potential) subset; it's more than just use of a
nonvalidating parser, which is an option I assume everyone on
the XML-DEV list understands.
Savings may not be 2/3 ... but I'd be _really_ surprised if they were
less than 1/2. The best way to know is to implement ... :-)
Meanwhile, consider that:
- The most complex syntax (content models, ATTLIST, and
other declarations) is in the DTD exclusively.
- State related to those constructs needs to be managed and
used even when not validating (given an internal subset),
such as performing mandatory attribute normalizations.
and (recursively) including internal entities.
- Entities are declared in the DTD (except for builtins)
and there's a fair bit of code involved in handling them
even if you don't include external entities.
- Every functionality taken out means it's possible to take
out the associated error handling and reporting, and often
to straighen out code paths. Such savings can be surprisingly
large; such handling often more than doubles code size.
- There are a lot of efforts under way that either don't
require DTDs, or which stumble over them.
- Applications would have a lot less low-level variation to
deal with, and higher levels would have a cleaner slate.
The savings are, in short, indirect as well as direct.
> On the other hand,
> it's reasonable to expect a validating parser to be twice the size
> of a non-validating one.
Last time I measured, it was more like 15% ... Validation, done right,
is mostly a bunch of carefully placed tests, monitoring a content model
state machine, and tracking IDs. (Try rebuilding Sun's parser without
the validation support -- there's a "static boolean" constant that
removes the tests, and then there are some classes that can go away.)
Of course, that 15% compares a validating parser against a nonvalidating
one which processed all the external entities ... as most do, since that
is the best way to get a portable application model processing.
> Note that nearly all the existing
> validating parsers parse DTD syntax just fine. -T.
I suspect you meant to say "nonvalidating" there ... :-)
Of course they do -- that's a requirement of being able to parse a
<!DOCTYPE ...> with an internal subset. But they can become a LOT
smaller if they don't need to handle even that, and are relieved of
the responsibilities to handle the syntax and state in a DOCTYPE.
- Dave
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|