OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Fast text output from SAX?

[ Lists Home | Date Index | Thread Index ]

Bob Wyman wrote:

>Dennis Sosnoski wrote:
>  
>
>>I think this *would* be a fair comparison test for 
>>the ASN.1 "fast infoset" approach
>>    
>>
>	Yes. You are correct. Both the ASN.1-based X.finfo and XBIS
>are binary encodings for "schema-free" XML documents. Thus, it makes
>sense to compare them to each other. It wouldn't be fair to compare
>XBIS to a "schema-based" binary encoding since the schema-based system
>would almost surely blow XBIS away in both compactness as well as
>parsing speed. 
>	It is important to note that by saying that well-implemented
>schema based approaches would be faster/smaller/etc. than XBIS, I'm
>not saying anything negative about XBIS. We're talking here about two
>classes of solution. (schema-based and schema-free) Each class is more
>or less appropriate and useful in different contexts and each has
>qualities that the other can't match. An orange should not be sorry
>that it is less crunchy than an apple.
>
I absolutely understand the distinction. I don't think the differences 
are likely to be as large as you seem to imply, though. A lot depends on 
the type of data used as the text content of the documents. If it's 
heavily binarizable types you may see a considerable size advantage for 
the schema-based approach, but I'd suspect this is something like a 
factor of 1.5-2x at most. Consider how much space you can save using a 
binary representation of an integer value vs. text:

  1 digit -> 1 byte, 1 byte
  2 digit -> 1 byte, 2 byte
  3 digit -> 2 byte, 3 byte
  4 digit -> 2 byte, 4 byte
  5 digit -> 3 byte, 5 byte
  etc.

This assumes you're using variable-length encoding of the binary values, 
7 bits of binary per byte of representation (rounding up slightly on the 
crossovers for the binary encoding). And I think that's one of the 
*best* cases for a binary representation. It'd be a little faster to 
reconstruct a binary value from the variable-length encoding than it 
would to convert the actual digits, but probably not by a lot. And for 
text values, a schema-based approach would offer no benefits at all.

One of the problems I have with Sun's "Fast Web Services" example is 
that they basically hardwired stuff in at a low level for their 
schema-based implementation, then compare that with the full text 
system. I haven't actually tried it out, but I suspect that if I 
combined a light-weight SOAP implementation I've built around my JiBX 
data binding framework with an XBIS transport layer I'd get better 
performance than the Sun team (maybe I should try it - think Sun would 
publish an article on "Fast-er Web Services" if I'm right? :-) ). That 
doesn't mean that XBIS is a "better" transport than the schema-based 
approaches, though, only that there are many factors involved in 
performance when you look at complex systems like Web services 
implementations, and the actual XML handling is only one of them.

  - Dennis

-- 
Dennis M. Sosnoski
Enterprise Java, XML, and Web Services
Training and Consulting
http://www.sosnoski.com
Redmond, WA  425.885.7197






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS