[
Lists Home |
Date Index |
Thread Index
]
- From: Marc.McDonald@Design-Intelligence.com
- To: david-b@pacbell.net, tbray@textuality.com
- Date: Mon, 28 Jun 1999 18:13:59 -0700
I agree that validating is not a sizable increase to the size of an XML
parser. I wrote a validating parser and the amount of extra code is probably
more in the 15% range than the 50% range for DTD parsing and attendant
indirect effects.
The parsing of DTD syntax is not large at all. Attribute defaulting and
normalization is also not much code. Entity expansion determines an
architecture for streams in the parser, but not much code. Element pattern
validation is about 1000 lines of C++, but I implemented an n-element
backtracking pattern matcher. It can handle declarations such as:
<!ELEMENT X ( (A, B, C, D, E) | (Q?, A, B, C, D, F))>
The spec only requires 1 element lookahead so it was overkill (the specific
clause said 'may' so I did).
I also agree that a better description of some XML features, entity
expansion in particular, would make implementation easier (in the sense of
clearly knowing what is required not the amount of code).
Marc B McDonald
Principal Software Scientist
Design Intelligence, Inc
www.design-intelligence.com <http://www.design-intelligence.com>
----------
From: Tim Bray [SMTP:tbray@textuality.com]
Sent: Monday, June 28, 1999 4:36 PM
To: David Brownell
Cc: XML-Dev Mailing list
Subject: Re: parsers for Palm?
At 04:27 PM 6/28/99 -0700, David Brownell wrote:
>> Distinguish between <!DOCTYPE > and validation. I do *not* agree
>> that parsing DTD syntax takes up 2/3 of a parser.
>
>I did distinguish between them.
...
>Savings may not be 2/3 ... but I'd be _really_ surprised if they
were
>less than 1/2. The best way to know is to implement ... :-)
I did.
>But they can become a LOT
>smaller if they don't need to handle even that, and are relieved of
>the responsibilities to handle the syntax and state in a DOCTYPE.
I disagree. We went through this quite a bit in the XML Syntax
Working
Group. It is absolutely *not* the case that DTD parsing is
demonstrably
very expensive. There was a conventional wisdom floating about that
a parser for a DTD-free dialect of XML could deliver the same
performance
and functionality in immensely less space. Empirical analysis fails
to support this contention. Analysis of existing parsers shows
immense
amounts of work going into things like reading Unicode efficiently,
doing
well-formedness checks on entity nesting, and tracking locations to
support good error messages - I repeat that there is a resounding
lack
of evidence to show that parsing DTD syntax is particularly taxing
for
any competent programmer. Even parameter entities aren't hard to
implement - they are hard to *describe*, just not hard to implement.
-Tim
xml-dev: A list for W3C XML Developers. To post,
mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on
CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|