OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Fast text output from SAX?

[ Lists Home | Date Index | Thread Index ]

Elliotte Rusty Harold wrote:

> At 10:00 AM -0400 4/14/04, Stephen D. Williams wrote:
> .....
>> Additionally, the whole parsing etc. stream for XML must be 
>> completely performed, in DOM cases and many SAX cases, for every 
>> element of a document/object.  With esXML, if a 3000 element 
>> document/object were read in and 5 elements manipulated, you only 
>> spend 5*element-manipulation-overhead.
> I flat out don't believe this. I think there's an underlying 
> assumption here (and in some of the other binary formats) which once 
> again demonstrates that they are not as much like XML as they claim. 
> The only way you can limit this is by assuming the data in your stream 
> is well-formed. In XML, we don't assume that. One of the 3000 nodes 
> you don't process may be malformed. You're assuming that's not the 
> case, and therefore avoiding a lot of overhead in checking for it. A 
> large chunk of any speed gain such a format achieves over real XML is 
> by cutting corners on well-formedness checking.

I could just as easily argue that every application has to perform 
schema validation, and then at a further level a complete 
application-level sanity validation (since DTD/Schema only goes so far), 
then referential integrity to database tables, etc.  Certainly very 
paranoid applications processing potentially unfriendly data need to do 
these levels, but it is not required of many other applications.
Of course when processing data you are doing some level of sanity 
checking, but your assertion that the only real XML (or XIS or ORX) 
application is one that fully validates all data, even data it doesn't 
otherwise need to know about or use, doesn't seem right for many 
applications.  In fact, in the n-tier application example, it is 
explicitly desired that each tier only be concerned with the data 
elements it operates on.  As an example, an initial step might be a full 
validation that is amortized over further processing.

That said, there is an equivalent for esXML to a well formedness check.  
The variable integers, sizes, codes, etc. all have very particular 
ranges of validity, along with the standard restrictions on the 
characters for names, values, processing instructions, etc.  It is not a 
requirement to fully validate a message for all applications, just as it 
is not a requirement to fully schema validate an XML 1.1 document for 
every XML application.

> If this is not the case for esXML and indeed it does make all mandated 
> well-formedness checks, then please correct my error. However, I'd be 
> very surprised that in that case that one could indeed limit parsing 
> overhead to the raw I/O.

Completely validating well-formedness is a processing step on it's own, 
not a only side effect of loading, or in the case of esXML manipulating, 
the data.
The fact that an XML 1.1 document is 'corrupted' is found when parsing; 
the fact that an esXML document, or ASN.1, JPEG, or whatever is corrupt 
is also found when it is parsed.  The fact that partial corruption isn't 
found when there is partial access of the object isn't a valid argument 
against being able to operate with only the work needed for a given 
application.  The fact that an MPEG2 player doesn't find corruption in 
the end of a video stream when it only plays the beginning doesn't mean 
that it doesn't validly detect corruption.


swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS