xml-dev - Re: [xml-dev] Polymorphic strings (Was: [xml-dev] SAX for BinaryEncoding

Re: [xml-dev] Polymorphic strings (Was: [xml-dev] SAX for BinaryEncoding

[ Lists Home | Date Index | Thread Index ]

To: Tyler Close <list@waterken.net>
Subject: Re: [xml-dev] Polymorphic strings (Was: [xml-dev] SAX for BinaryEncodings (SAD-SAX))
From: Rick Marshall <rjm@zenucom.com>
Date: Mon, 10 Nov 2003 07:40:15 +1100
Cc: "Simon St.Laurent" <simonstl@simonstl.com>, xml-dev@lists.xml.org
In-reply-to: <200311091247.37984.list@waterken.net>
Organization: Zenucom Pty Limited
References: <r02000200-1028-84D8A161122B11D88BED0003937A08C2@[192.168.124.11]> <200311091247.37984.list@waterken.net>
Reply-to: rjm@zenucom.com

i know conventional wisdom says convert to binary for efficiency.

but i'm on record as saying, and i've at least demonstrated that it's
only conventional wisdom.

processing efficiency depends on what you want to do with the data.

first binary isn't always smaller. someone might want to do the numbers
- here's the calculation - how many bytes does it take to represent the
largest integer/float as text, how many bytes does it take to represent
the number as binary. now take a probability distribution of the numbers
encountered in your application and their ratios of text/binary. most
applications don't save a lot. now calculate that saving against all the
other stuff in the typical message - elements, attributes, text values.

that leaves numeric only applications with big savings and they probably
shouldn't be using xml as a transport protocol in the first place.

second, binary isn't always faster. it takes time to convert to and from
text, particularly for floats. if all you want to do is display
something there's no point in the binary. in fact you need to do a lot
of calculations to get the overall time used by your application down.

in our database applications we find only about 5% of the cpu time is
used in calculations and even doubling the speed has no impact on users'
speed perceptions. and most of our numbers are ready for immediate
display (stored as text) so we probably go faster overrall anyway.

if it's so important to transmit data in asn.1 why not have the xml set
up properly. eg

<fingers type="integer">10</fingers>

then write an xml to asn.1 transform in sax or xslt that produces
something that can be easily compiled.

or simply write a sax-like product that will accept xml that conforms to
a published syntax. if it's good and useful people will use it, if it's
not someone will build a better mousetrap. if it's a bad idea it'll
wither and die. with experience it might lead to an extension of xml or
a new standard.

rick

On Mon, 2003-11-10 at 07:47, Tyler Close wrote:
> On November 8, 2003 12:38 pm, Simon St.Laurent wrote:
> > I can stomach binary interchange.  I just don't want the people pushing
> > binary XML to make changes to XML processing while doing it.  That's
> > what Bob seems to be up to, and that's worth drawing a line.
> >
> > I'd be thrilled to see ASN.1 readers which produce SAX2 events and ASN.1
> > writers which consume SAX2 events.  I'm not happy to hear notions of
> > PSVI-like typing polluting the SAX2 space.  If you want typing, find
> > another API - and accept the costs of doing that.  If the ASN.1
> > community wants to reach out to the XML community, it needs to create
> > ASN.1 tools which talk to XML tools without imposing ASN.1's own and
> > different perspective on how data should be presented.
> 
> Such a solution is possible, but it requires use of polymorphism, instead of 
> simple character arrays.
> 
> I believe the proposal to extend SAX for use with ASN.1 is based on 
> performance. The thought is that converting from typed data to text strings 
> and back to typed data is pointlessly inefficient. This inefficiency prompted 
> the proposal to extend SAX with an API that bypasses the text conversion and 
> directly passes the typed data to the application.
> 
> An alternative solution for this efficiency issue is to optimize out the 
> conversions using a lazy execution design pattern. Instead of passing an 
> actual text string to the SAX handler, you pass an object that wraps the 
> concrete data type, and only performs the conversion to text if the 
> application uses the object through the String API. The data type decoder 
> (ie: the code that converts from text strings to data types) first checks its 
> input to see if it is one of these wrapper objects. If so, the decoder just 
> extracts the contained data type. Otherwise, the decoder uses the input as a 
> text string and decodes it as before.
> 
> This solution optimizes out the string conversion inefficiency without 
> changing the SAX API. Code that works on the input stream as simple text 
> continues to function as is. The wrapper objects convert the typed data to 
> text on demand and transparent to the SAX client code.
> 
> Implementing this kind of solution without change to the SAX API requires that 
> the programming language completely support polymorphism. Unfortunately, 
> popular languages, like Java, don't have this support. In Java, you can't 
> create a subclass of a character array, and the SAX API is defined in terms 
> of character arrays. Could the two sides compromise on a modified SAX API 
> that was defined in terms of a polymorphic string type?
> 
> If you'ld like to better understand this solution, the Waterken Doc library 
> implements it. See:
> 
> http://www.waterken.com/dev/Doc/
> 
> The SAX-like interface is:
> 
> http://www.waterken.com/javadoc/Doc/com/waterken/doc/Visitor.html
> 
> The polymorphic String type is:
> 
> http://www.waterken.com/javadoc/ADT/com/waterken/adt/Source.html
> 
> Some of the wrapper types are:
> 
> http://www.waterken.com/javadoc/ADT/com/waterken/adt/string/String.html
> http://www.waterken.com/javadoc/Enc/com/waterken/enc/base64/Decoded.html
> http://www.waterken.com/javadoc/Enc/com/waterken/enc/base10/Decoded.html
> 
> I haven't kept up with the many messages in this thread. Apologies if this 
> kind of solution has already been suggested.
> 
> Tyler

Follow-Ups:
- RE: [xml-dev] Polymorphic strings (Was: [xml-dev] SAX for BinaryEncodings (SAD-SAX))
  - From: "Bob Wyman" <bob@wyman.us>

References:
- Re: [xml-dev] SAX for Binary Encodings (SAD-SAX)
  - From: "Simon St.Laurent" <simonstl@simonstl.com>
- Polymorphic strings (Was: [xml-dev] SAX for Binary Encodings (SAD-SAX))
  - From: Tyler Close <list@waterken.net>

Prev by Date: RE: [xml-dev] SAX for Binary Encodings (SAD-SAX)
Next by Date: Production 78 / Process failure in XML 1.1
Previous by thread: Re: [xml-dev] Polymorphic strings (Was: [xml-dev] SAX for Binary Encodings (SAD-SAX))
Next by thread: RE: [xml-dev] Polymorphic strings (Was: [xml-dev] SAX for BinaryEncodings (SAD-SAX))
Index(es):
- Date
- Thread