xml-dev - Re: [xml-dev] Fast text output from SAX?

Re: [xml-dev] Fast text output from SAX?

[ Lists Home | Date Index | Thread Index ]

To: bob@wyman.us
Subject: Re: [xml-dev] Fast text output from SAX?
From: Dennis Sosnoski <dms@sosnoski.com>
Date: Tue, 13 Apr 2004 18:22:38 -0700
Cc: "'Thomas B. Passin'" <tpassin@comcast.net>, "'XML DEV'" <xml-dev@lists.xml.org>
In-reply-to: <00d701c421a9$46177ab0$650aa8c0@BOBDEV>
References: <00d701c421a9$46177ab0$650aa8c0@BOBDEV>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030312

Bob Wyman wrote:

>Dennis Sosnoski wrote:
>  
>
>>I think this *would* be a fair comparison test for 
>>the ASN.1 "fast infoset" approach
>>    
>>
>	Yes. You are correct. Both the ASN.1-based X.finfo and XBIS
>are binary encodings for "schema-free" XML documents. Thus, it makes
>sense to compare them to each other. It wouldn't be fair to compare
>XBIS to a "schema-based" binary encoding since the schema-based system
>would almost surely blow XBIS away in both compactness as well as
>parsing speed. 
>	It is important to note that by saying that well-implemented
>schema based approaches would be faster/smaller/etc. than XBIS, I'm
>not saying anything negative about XBIS. We're talking here about two
>classes of solution. (schema-based and schema-free) Each class is more
>or less appropriate and useful in different contexts and each has
>qualities that the other can't match. An orange should not be sorry
>that it is less crunchy than an apple.
>
I absolutely understand the distinction. I don't think the differences 
are likely to be as large as you seem to imply, though. A lot depends on 
the type of data used as the text content of the documents. If it's 
heavily binarizable types you may see a considerable size advantage for 
the schema-based approach, but I'd suspect this is something like a 
factor of 1.5-2x at most. Consider how much space you can save using a 
binary representation of an integer value vs. text:

  1 digit -> 1 byte, 1 byte
  2 digit -> 1 byte, 2 byte
  3 digit -> 2 byte, 3 byte
  4 digit -> 2 byte, 4 byte
  5 digit -> 3 byte, 5 byte
  etc.

This assumes you're using variable-length encoding of the binary values, 
7 bits of binary per byte of representation (rounding up slightly on the 
crossovers for the binary encoding). And I think that's one of the 
*best* cases for a binary representation. It'd be a little faster to 
reconstruct a binary value from the variable-length encoding than it 
would to convert the actual digits, but probably not by a lot. And for 
text values, a schema-based approach would offer no benefits at all.

One of the problems I have with Sun's "Fast Web Services" example is 
that they basically hardwired stuff in at a low level for their 
schema-based implementation, then compare that with the full text 
system. I haven't actually tried it out, but I suspect that if I 
combined a light-weight SOAP implementation I've built around my JiBX 
data binding framework with an XBIS transport layer I'd get better 
performance than the Sun team (maybe I should try it - think Sun would 
publish an article on "Fast-er Web Services" if I'm right? :-) ). That 
doesn't mean that XBIS is a "better" transport than the schema-based 
approaches, though, only that there are many factors involved in 
performance when you look at complex systems like Web services 
implementations, and the actual XML handling is only one of them.

  - Dennis

-- 
Dennis M. Sosnoski
Enterprise Java, XML, and Web Services
Training and Consulting
http://www.sosnoski.com
Redmond, WA  425.885.7197

References:
- RE: [xml-dev] Fast text output from SAX?
  - From: "Bob Wyman" <bob@wyman.us>

Prev by Date: Re: [xml-dev] RE: Schema vs Schema-free (was: RE: [xml-dev] XMLBinary Characterization WG public list availabl)
Next by Date: Re: [xml-dev] Fast text output from SAX?
Previous by thread: Re: [xml-dev] Fast text output from SAX?
Next by thread: Re: [xml-dev] Fast text output from SAX?
Index(es):
- Date
- Thread