OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Pushing all the buttons

[ Lists Home | Date Index | Thread Index ]

> time invested in XML parsing would pay dividends.

Yes, but it's not just an XML parsing problem.  You've got to look at 
the whole process of going the block of data you get from the kernel to 
the in-memory application-specific Java objects (and vice-versa).  In a 
typical Java implementation today, this involves numerous layers:

1. The data gets copied from the buffer where the kernel put it into 
memory under control of the Java runtime.

2. The data gets copied through a buffering stage by a BufferedInputStream.

3. The bytes get turned into characters using an InputStreamReader.

4. The XML parser processes the characters and delivers SAX events.

5. The XML data binding tool does its thing and turns those in events 
into application-specific objects.

With Sun's Fast Web Services stuff, they are going directly from a 
sequence of bytes to application-specific objects, cutting out at least 
two of the layers in the XML-based implementation.  I am quite willing 
to believe they can get an order of magnitude improvement.

However, it is also possible to apply the same approach to XML.  I 
believe this would give a substantial performance improvement.  The 
basic idea is you would have a data binding tool that compiles a schema 
into something that would operate not on SAX events but directly on the 
bytes representing the XML document.

To make this practical a little XML subsetting is required.  First, I 
think you would need to do what the SOAP folks have done and disallow 
DTDs; handling entities would make this approach very difficult. 
Second, you really need to fix on a single encoding.  I think UTF-8 is 
the obvious choice for Web services.  A single encoding allows you to 
cut out a whole layer of your processing stack.  Instead of converting 
bytes to characters and then parsing those characters into objects, you 
can parse the bytes directly into objects.  For maximum 
interoperability, you could use the optimized code-path when  the XML 
keeps to the subset and fall back to the general but slow code-path when 
it doesn't.

I think the appropriate measure of the value of Sun's Fast Web Services 
approach is what performance improvement it could offer over the sort of 
approach I've described.



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS