Lists Home |
Date Index |
At 8:15 AM +0700 03.9.21, James Clark wrote:
>However, it is also possible to apply the same approach to XML. I
>believe this would give a substantial performance improvement. The
>basic idea is you would have a data binding tool that compiles a
>schema into something that would operate not on SAX events but
>directly on the bytes representing the XML document.
The suggested approach sounds very promising, and I'm interested in
hearing any implementation experiences toward this direction. Is there
anyone already working on it?
>To make this practical a little XML subsetting is required. First,
>I think you would need to do what the SOAP folks have done and
>disallow DTDs; handling entities would make this approach very
>difficult. Second, you really need to fix on a single encoding. I
>think UTF-8 is the obvious choice for Web services. A single
>encoding allows you to cut out a whole layer of your processing
>stack. Instead of converting bytes to characters and then parsing
>those characters into objects, you can parse the bytes directly into
I'd like to propose simply employing W3C Canonical XML as the byte
representation specification of XML documents since the requirements
for subsetting and fixed encoding are naturally met in Canonical XML.
Using Canonical XML for Web Services have another advantage that
the message can easily be signed.
> For maximum interoperability, you could use the optimized
>code-path when the XML keeps to the subset and fall back to the
>general but slow code-path when it doesn't.
Or, we can have a separate optional pre-processing step to transcode
arbitrary XML data into Canonical XML.