xml-dev - Re: [xml-dev] Fast text output from SAX?

Re: [xml-dev] Fast text output from SAX?

[ Lists Home | Date Index | Thread Index ]

To: bob@wyman.us
Subject: Re: [xml-dev] Fast text output from SAX?
From: "Stephen D. Williams" <sdw@lig.net>
Date: Fri, 16 Apr 2004 17:50:16 -0400
Cc: 'Elliotte Rusty Harold' <elharo@metalab.unc.edu>,'XML DEV' <xml-dev@lists.xml.org>
In-reply-to: <002101c423f0$a9cffd40$650aa8c0@BOBDEV>
References: <002101c423f0$a9cffd40$650aa8c0@BOBDEV>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6b) Gecko/20031208

Bob Wyman wrote:

> ...
>
>	Just like you, I groaned when I saw the suggestion that you
>could take "wire-protocol" and then just stuff it into memory. This
>  
>
Not wire-protocol, wire-format = 'the same format as on the wire, in a 
file, etc.', the data payload in other words.
I am asserting that it is possible to construct a data format that is 
efficient for desired operations that is also compact in memory and 
therefore can be input and output as-is without transformation.  The 
hard part is allowing in-place modifications to be efficient to do and 
not result in much or any space overhead.  Everything else is done or 
could easily be done with other formats.  If any data format is 
self-describing in the XML sense, I can write a library that allows me 
to traverse its structure and retrieve data in an XPath style.  ASN.1 
and similar IDL systems usually compile into data-specific code, but 
even for these formats I could devise metadata that a general purpose 
library could use to traverse the resulting structures in an XPath style 
to retrieve and convert values.

>might work with text, but it sure as heck doesn't work with binary
>formats or anything that contains an address or offset. The
>  
>
I can think of several ways to represent an offset that is independant 
of a particular architecture and I'm sure you can too.

>distinctions between wire-protocol, in-memory-format, and
>on-disk-format, are fundamental. Every proposal that I've ever seen
>  
>
Why would the wire format (not wire protocol) and on-disk-format be 
different?  I'm not talking about the wire-protocol; to the application 
the transport just takes a stream of bytes, possibly in chunks, and 
returns the same.

>for a "common" format for use in two or more of these contexts has
>ended up failing for one reason or another. As far as stuffing
>wire-protocol into memory goes: Let me just say that *NOBODY* is ever
>going to write to *MY* address space without a great deal of checking
>  
>
What are you thinking here?  Who would be writing into your address 
space?  DMA from the network directly to application memory?  (This does 
have some use in high end computing situations, but that's not what I'm 
talking about.)

My proposal consists of loading a block (or string of blocks) of data 
into a buffer, traversing and reading or modifying that data with a 
library, and later possibly writing the resulting buffer out.  What 
strikes you as dangerous about that?

When you load a buffer of data and feed it to a gzip library to 
decompress it, isn't that the same situation on a bulk scale?

>going on... Also, if this problem was as simple as just replacing
>direct addresses with relative addresses, don't people realize that we
>  
>
That's not what I am doing; my recent example was a proof of concept and 
proof of existance of a solution that met the specific requirements 
being dicussed: avoidance of parsing and serialization as a separate 
step.  That doesn't mean that a solution with a relative reference would 
be bad, but my main methods are not relative addresses.
Please read about my approach at: http://esxml.org and do point out my 
errors.

>probably would have figured this out a few decades ago? As an
>industry, we're not so stupid that we would missed something so
>obvious... Some times, the obvious solution is *SO* obvious that it
>must be flawed.
>  
>
Better famous last words have seldom been spoken.  :-)
I see advances every day that cause me to ponder the same question.  
I've been programming a fairly long time and, besides horsepower, there 
are a lot of things the royal we should have thought of 20 years ago.  I 
think I was even independantly first on several very popular ideas, but 
I didn't act publicly on those.

I can't guaruntee that the best future example of my approach will be 
super efficient and an obvious choice, but I have aggregated enough 
solutions in my current design that I have convinced myself that it is 
possible.  I would rather release code than talk about it once I have 
some design decisions, this last week notwithstanding.  ;-)  Later.

> .....
>
>		bob wyman
>  
>

sdw

-- 
swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw

References:
- RE: [xml-dev] Fast text output from SAX?
  - From: "Bob Wyman" <bob@wyman.us>

Prev by Date: Re: [xml-dev] xml editors
Next by Date: RE: [xml-dev] Fast text output from SAX?
Previous by thread: RE: [xml-dev] Fast text output from SAX?
Next by thread: Re: [xml-dev] Fast text output from SAX?
Index(es):
- Date
- Thread