Lists Home |
Date Index |
Elliotte Rusty Harold wrote:
> At 10:00 AM -0400 4/14/04, Stephen D. Williams wrote:
>> Additionally, the whole parsing etc. stream for XML must be
>> completely performed, in DOM cases and many SAX cases, for every
>> element of a document/object. With esXML, if a 3000 element
>> document/object were read in and 5 elements manipulated, you only
>> spend 5*element-manipulation-overhead.
> I flat out don't believe this. I think there's an underlying
> assumption here (and in some of the other binary formats) which once
> again demonstrates that they are not as much like XML as they claim.
> The only way you can limit this is by assuming the data in your stream
> is well-formed. In XML, we don't assume that. One of the 3000 nodes
> you don't process may be malformed. You're assuming that's not the
> case, and therefore avoiding a lot of overhead in checking for it. A
> large chunk of any speed gain such a format achieves over real XML is
> by cutting corners on well-formedness checking.
I could just as easily argue that every application has to perform
schema validation, and then at a further level a complete
application-level sanity validation (since DTD/Schema only goes so far),
then referential integrity to database tables, etc. Certainly very
paranoid applications processing potentially unfriendly data need to do
these levels, but it is not required of many other applications.
Of course when processing data you are doing some level of sanity
checking, but your assertion that the only real XML (or XIS or ORX)
application is one that fully validates all data, even data it doesn't
otherwise need to know about or use, doesn't seem right for many
applications. In fact, in the n-tier application example, it is
explicitly desired that each tier only be concerned with the data
elements it operates on. As an example, an initial step might be a full
validation that is amortized over further processing.
That said, there is an equivalent for esXML to a well formedness check.
The variable integers, sizes, codes, etc. all have very particular
ranges of validity, along with the standard restrictions on the
characters for names, values, processing instructions, etc. It is not a
requirement to fully validate a message for all applications, just as it
is not a requirement to fully schema validate an XML 1.1 document for
every XML application.
> If this is not the case for esXML and indeed it does make all mandated
> well-formedness checks, then please correct my error. However, I'd be
> very surprised that in that case that one could indeed limit parsing
> overhead to the raw I/O.
Completely validating well-formedness is a processing step on it's own,
not a only side effect of loading, or in the case of esXML manipulating,
The fact that an XML 1.1 document is 'corrupted' is found when parsing;
the fact that an esXML document, or ASN.1, JPEG, or whatever is corrupt
is also found when it is parsed. The fact that partial corruption isn't
found when there is partial access of the object isn't a valid argument
against being able to operate with only the work needed for a given
application. The fact that an MPEG2 player doesn't find corruption in
the end of a video stream when it only plays the beginning doesn't mean
that it doesn't validly detect corruption.
email@example.com http://www.hpti.com Per: firstname.lastname@example.org http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw