xml-dev - Re: [xml-dev] The Rising Sun: How XML Binary Restored the Fortunesof Inn

Re: [xml-dev] The Rising Sun: How XML Binary Restored the Fortunesof Inn

[ Lists Home | Date Index | Thread Index ]

To: Rick Marshall <rjm@zenucom.com>
Subject: Re: [xml-dev] The Rising Sun: How XML Binary Restored the Fortunesof Innovato rs
From: "Stephen D. Williams" <sdw@lig.net>
Date: Thu, 07 Apr 2005 18:53:03 -0400
Cc: Michael Champion <michaelc.champion@gmail.com>,Elliotte Rusty Harold <elharo@metalab.unc.edu>,"Bullard, Claude L (Len)" <len.bullard@intergraph.com>,xml-dev@lists.xml.org
In-reply-to: <4255B267.1090401@zenucom.com>
References: <15725CF6AFE2F34DB8A5B4770B7334EE07206DA4@hq1.pcmail.ingr.com> <425561B3.7070106@metalab.unc.edu> <e3a5cb2c050407110544e18e7f@mail.gmail.com> <4255B267.1090401@zenucom.com>
User-agent: Mozilla Thunderbird 0.8 (Windows/20040913)

What you are describing is close to what some proposals involve.  You 
make a few assumptions, about not being able to update for instance, 
that are not necessarily true.  The complete set of features 
(properties), detailed fairly well in the Binary XML working group 
documents, provides the design context for this.

If "we" want to end up calling this "transport XML" or "transport DOM" 
or "holistically optimized DOM" (because it isn't just about transport, 
it's about the holistic overhead between one in-memory representation 
and another accross the "wire"), that's just fine by me.  You are 
talking about an interchangable, standardized format that is usable 
between architectures, development environments, and various 
applications, in other words a general purpose binary data format.  The 
problem with not signaling it's semantic place in the world with "XML" 
is a marketing problem and potentially a control and compatibility 
problem.  A number of us want it to be associated with XML so that we do 
have equivalence to a high degree and an understanding of 
applicability.  If it evolves independently, it may diverge.  (That's 
not to say that I don't think there are additional semantics that make 
sense primarily in an binary world.)

BTW, the "cache them if they are standardized" directly maps to the 
Deltas property, something that I advocate.  Additionally, you may want 
to think about the possibility that you could create a format similar to 
what you describe that can be randomly accessed incrementally without 
first parsing or decoding.  An incremental random access format will 
beat any parser/decoder on total time of access to a few fields of a 
large, complicated instance.

Then consider whether a format could support relatively efficient 
updates without parsing / decoding / encoding / serialization.  I can 
prove easily that a format could support that; it is more difficult to 
verbally prove compactness at the same time.  As I've said, it is 
important to consider what constraints you need to support in what 
combinations and find ways to unify the solution.  This is completely 
different from adding bells and whistles.

I think that most instances of worry about updating other specs is a red 
herring.  For most of them, you can just declare XML 1.x equivalence for 
operations involving those specs.  That doesn't even mean you have to 
convert to XML 1.x to be conforming, only that you operate "as if" that 
were the case.  (A la ANSI C standard in places.)

sdw

Rick Marshall wrote:

> my 2c on this (again...)
>
> 1. there seems to be use cases where something other than "text" xml 
> would be useful while still preserving the infoset
> 2. xml, per se, doesn't know about types (xsd does...) so anything 
> that assumes "x" is a number has to be lossy
> 3. as far as i can tell most use cases are about transporting xml 
> inside an application. eg sending to mobile phones, printers, etc - in 
> which case there are better ways to do this
> 4. instead of a "binary" xml format - what about a "dom transport" 
> format? this could be dense and lossless:
>
> header - element dictionary, attribute dictionary, processing 
> instructions (can't avoid sending these as text)
> body - rle encoded infoset - element code/num attributes/attribute 1 
> code/length/data/..../element length/data - and it's recursive (do you 
> want the bnf?)
>
> in terms of bits on the wire this wouldn't have to based on 8 bits, it 
> could use the good old msb=1 coding to get better compression
>
> yes, this is a variation on gzip etc. the important difference is that 
> the message is parsed before transmission and as a transmission ONLY 
> format it does not have to allow for updates etc. it can be fed 
> straight into a dom or sax tool without parsing overhead or decoded 
> back to xml for storage manipulation. as a byproduct our mobile 
> friends win on transport bits and processing cycles. a closed 
> application could potentially even drop the headers (or cache them if 
> they are standardised).
>
> now theres a market for:
>
> 1. standard dom representation (?)
> 2. dom transport encoder
> 3. dom transport decoder
>
> and in true open standard tradition companies can produce middleware 
> for each of these.
>
> there are a few flies in the ointment - mostly to do with java, but we 
> can work on those. (just for the record i live in c world)
>
> rick
>
> ps what i'm really saying is the effort should NOT be in binary xml 
> (that implies update etc and a whole set of parallel or modified ws-* 
> etc), but i think there is there is a compelling case for an xml 
> transport format.
>
> Michael Champion wrote:
>
>> On Apr 7, 2005 12:37 PM, Elliotte Rusty Harold 
>> <elharo@metalab.unc.edu> wrote:
>>
>>  
>>
>>> Far too much effort has been spent shouting supposed self-evident 
>>> truths
>>> about how much faster/smaller/sexier binary formats are. Little in the
>>> way of evidence has been produced,   
>>
>>
>> As someone who has been (personally) receptive  to the idea of "binary
>> XML" technology and possibly even a single standard IF it could cover
>> a wide range of cases , I must say that this was my biggest
>> disappointment when reading the XBC final document. They developed
>> decent use cases,  attributes to measure, and measurement
>> methodologies, and collected a number of prototype formats and
>> implementations, but didn't put them all together and generate some
>> numbers.  Rather than those Yes/No entries in the table for how the
>> formats met the criteria, someone could actually run tests and plug in
>> real numbers.  I can only assume that those who know what the numbers
>> would be also know that they do not support the case that a single
>> "binary XML" format can be both small enough and fast enough (not to
>> mention the 15 other attributes!) to make much of a difference. 
>> Otherwise, why not publicize them and address the concerns that the
>> TAG expressed?
>>
>> I fervently hope that the W3C doesn't move forward with a followon WG
>> until that quantitative evidence is available to all concerned. Until
>> then, we're back where we started - we know that some formats are
>> dramatically faster OR smaller than XML for some set of use cases, but
>> we don't have any reason to believe that one format can do it all
>> across use cases, or even that one could hit a reasonable 80/20 point
>> compared with XML. The idea of forming a WG to do some computer
>> science here, trusting that an acceptable size/speed tradeoff is
>> possible while maintaining XML API/tools/datamodel/etc. compatibility,
>> is at best implausible at this point.
>>
>>  
>>
>>> Personally, I'm not convinced that there isn't another factor of ten
>>> performance gain to be had in the world of real XML parsing, though
>>> doing that will probably require ditching DOM and SAX in favor of a 
>>> more
>>> performance tuned API. Still, I doubt we've yet hit the end of the line
>>> when it comes to XML parsing algorithms. I could well be wrong about
>>> that, but it would be truly ironic if two weeks after binary XML 
>>> goes to
>>> REC, some grad student somewhere releases a text parser that beats the
>>> pants off the binary parsers.
>>>   
>>
>>
>> Noah Mendelsohn made that point in a very compelling way at the W3C
>> binary workshop in September 2003.  He reiterated it on the TAG list
>> the other day, noting that the mainstream Java XML tools were designed
>> for conformance, not performance, and serious brainpower is only now
>> beginning to be thrown at the problem of efficiently parsing XML and
>> conveniently consuming the result.
>>
>>  
>>
>>> I suspect this would be more likely to happen if companies like Sun
>>> devoted their brain power to XML parsing algorithms rather than
>>> inventing new formats.
>>>   
>>
>>
>> I think it's important to focus on people/problems/projects and not
>> who they work for. MS as a whole is extremely skeptical about "Binary
>> XML" standardization, but nobody has decreed that Binary XML is Evil
>> and is off the radar as a solution.  Sun as a whole seems receptive to
>> standardization, but  I doubt if anyone there has decreed that Binary
>> XML is the One True Path and parser optimization is pointless.  Almost
>> everyone I talk to on both sides of the philosophical divide (or the
>> .NET / Java divide)  has an appreciation of the dilemmas, and I'll bet
>> there are material Day Job rewards waiting for clever solutions to
>> actual customer problems, whether they involve new formats, new
>> algorithms/APIs, finding an efficient subset of XML that hits the
>> 99/01 point for their use cases, whetever. The most important thing
>> IMHO is to not prematurely standardize on something we will all regret
>> in a few years, but not to prematurely reject options based on
>> preconceptions of one sort or another either.
>>
>> I do think that the burden of proof is on those who would break things
>> that actually work without compelling evidence that would make us much
>> better off down the road. To be fair to the XBC people,  until we
>> really and truly see XML 1.0 working for the use cases that they laid
>> out, I'm planning to keep an open mind about how to make the XML
>> family of technologies work better for all the users and potential
>> users.  For now, various "binary XML" technologies definitely have
>> their place in specialized domains and scenarios, but nobody has come
>> anywhere near demonstrating that a "binary XML" standard would do more
>> good than harm.  For that matter, I don't see how it is plausible to
>> believe that a consensus can be reached on a  format that trades off
>> the needs of the wireless people for cheap compression and the needs
>> of the enterprise messaging people for processing speed.
>>
>> -----------------------------------------------------------------
>> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>> initiative of OASIS <http://www.oasis-open.org>
>>
>> The list archives are at http://lists.xml.org/archives/xml-dev/
>>
>> To subscribe or unsubscribe from this list use the subscription
>> manager: <http://www.oasis-open.org/mlmanage/index.php>
>>
>>
>> !DSPAM:42557720148789716947119!
>>
>>  
>>
>
>-----------------------------------------------------------------
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://www.oasis-open.org/mlmanage/index.php>
>


-- 
swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw

Follow-Ups:
- Re: [xml-dev] The Rising Sun: How XML Binary Restored theFortunes of Innovators
  - From: "Steven J. DeRose" <sderose@acm.org>

References:
- The Rising Sun: How XML Binary Restored the Fortunes of Innovators
  - From: "Bullard, Claude L (Len)" <len.bullard@intergraph.com>
- Re: [xml-dev] The Rising Sun: How XML Binary Restored the Fortunesof Innovato rs
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] The Rising Sun: How XML Binary Restored the Fortunes of Innovato rs
  - From: Michael Champion <michaelc.champion@gmail.com>
- Re: [xml-dev] The Rising Sun: How XML Binary Restored the Fortunesof Innovato rs
  - From: Rick Marshall <rjm@zenucom.com>

Prev by Date: Re: [xml-dev] The Rising Sun: How XML Binary Restored the Fortunesof Innovato rs
Next by Date: Re: [xml-dev] The Rising Sun: How XML Binary Restored the Fortunesof Innovato rs
Previous by thread: Re: [xml-dev] The Rising Sun: How XML Binary Restored the Fortunesof Innovato rs
Next by thread: Re: [xml-dev] The Rising Sun: How XML Binary Restored theFortunes of Innovators
Index(es):
- Date
- Thread