xml-dev - Re: [xml-dev] Fast text output from SAX?

Re: [xml-dev] Fast text output from SAX?

[ Lists Home | Date Index | Thread Index ]

To: Dennis Sosnoski <dms@sosnoski.com>
Subject: Re: [xml-dev] Fast text output from SAX?
From: "Stephen D. Williams" <sdw@lig.net>
Date: Fri, 16 Apr 2004 16:23:13 -0400
Cc: Elliotte Rusty Harold <elharo@metalab.unc.edu>, bob@wyman.us,'XML DEV' <xml-dev@lists.xml.org>
In-reply-to: <40803A1D.1080004@sosnoski.com>
References: <006b01c423d0$2ef38ee0$650aa8c0@BOBDEV> <p06010202bca5d6132a8a@[192.168.254.88]> <40803A1D.1080004@sosnoski.com>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6b) Gecko/20031208

+1

Furthermore, validating the structure of the format is needed, but this 
can be done with reasonable efficiency also and is of course done when 
traversing.

I still feel that incremental and partial validation is a valid 
operating mode, i.e. complete validation of whatever is visited but not 
enforced visitation of an entire data structure.  To draw another 
analogy: certainly Oracle, MySQL, MS SQL Server, et al validate their 
data structures, but between starting a database server and the first 
database query, they do not necessarily scan and validate every byte of 
a possibly terabyte data structure.  Operating systems and filesystems 
are similarly not pedantic about full validation before partial use.  
You may argue that this is a different scale or something, but there are 
many examples where the scale is within an order of magnitude.  The 
comparison of PDF to potential optimized XML document formats by Adobe 
at Santa Clara is a good example.  A book length document could be 
50+MB.  Must the full document file be fully downloaded and validated 
before the first page is rendered because of a stretched ideal?

It may not seem fair for a binary format to be able to avoid some 
validation that XML 1.1, by it's nature, must perform exhaustively.  
Feel free to pay attention to only those benchmarks that involve a full 
validation step before use, but there will be other benchmarks that some 
will find interesting that do not agree with the presence of this 
requirement.

sdw

Dennis Sosnoski wrote:

> Elliotte Rusty Harold wrote:
>
>> Bob. You may not need to be lectured on this, but some other people 
>> do,as the plethora of software that crashes on unexpected input 
>> proves. It has been proposed in this very thread to use binary 
>> formats precisely to avoid the overhead of checking for data 
>> correctness. Just slam some bits into memory and assume everything is 
>> hunky dory. I have seen any number of binary formats that achieve 
>> speed gains precisely by doing this. And it is my contention that if 
>> this is disallowed (as I think it should be) much, perhaps all, of 
>> the speed advantages of these binary formats disappears.
>>  
>>
> Actually the speed advantages wouldn't be significantly changed, at 
> least not for XBIS. Since XBIS already uses handles to refer to names 
> it'd only need to verify the characters of a name the first time it 
> sees it; this would be very low overhead for most documents, where a 
> limited set of (element and attribute) names are used throughout the 
> document (which is the whole reason the handle approach is used in the 
> first place). XBIS already scans the characters of content, too, so 
> it'd just need to add a single conditional check in most cases to make 
> sure a character is legal. What else would need to be checked? 
> Attribute uniqueness could be handled by a fast hash index into an 
> array of booleans, with full comparions only needed on collisions. 
> Those are the main issues that come to mind for me.
>
> Most of the well-formedness issues of text XML (start/end tags missing 
> or out of order, attribute quoting errors, etc.) are impossible to 
> represent in XBIS format in the first place. I'd estimate that full 
> well-formedness checking wouldn't add more than 10% overhead to XBIS 
> performance. Of course, I fully expect you'll dispute this, 
> Elliotte... :-)
>
>  - Dennis
>

sdw

-- 
swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw

References:
- RE: [xml-dev] Fast text output from SAX?
  - From: "Bob Wyman" <bob@wyman.us>
- RE: [xml-dev] Fast text output from SAX?
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] Fast text output from SAX?
  - From: Dennis Sosnoski <dms@sosnoski.com>

Prev by Date: Re: [xml-dev] Fast text output from SAX?
Next by Date: RE: [xml-dev] Fast text output from SAX?
Previous by thread: Re: [xml-dev] Fast text output from SAX?
Next by thread: Re: [xml-dev] Fast text output from SAX?
Index(es):
- Date
- Thread