OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xml-bin] RE: Another binary XML approach

From: Derek Denny-Brown [mailto:derekdb@microsoft.com]

>I'll finally chime, in on this issue, since it seems to have fallen down
>to a question of how worthwhile binary/tokenized XML really is.

Thanks, Derek.

>>avoid reinventing the wheel

>Reinventing the Wheel, is always an issue, but there is a time and a
>place.  Why XML at all since you could just use SGML?  

Good question.  XML is SGML as practiced.  I think it had something 
to do with the W3C running the show instead of ISO.  This thread seems 
to be moving the shoe back.  We start out with a technical requirement 
and end up sucked into a big power play.  Life is too short...

>To steal from Tim
>Bray's recent hit/miss presentation; one good reason to reinvent is to
>adjust an existing standard to better 'hit' the 80% that matters. 
Maybe this binary thing is next year's mega hit.  I don't know but I learned

long ago to dismiss HitHype as DJ rant.  Good for the guy spinning 
the record and the artist the indie paid him to spin; noise to the 

>>embedded devices, high-volume transactions, efficiency, compression

>I'll rephrase this in a form that includes more quantifiable items:
>- parser size/complexity
>- parse time
>- file-size

Good for the function.  Form and fit are still missing. I think that 
is what Rick Jeliffe is describing.  The ubiquity of the system 
depends on fitting the layers appropriately given the requirements 
of each layer both in the perspective of the medium and the use.

>In a prototype of a tokenized XML format, these results came out to
>- parser size/complexity	roughly 10:2
>- parse time			roughly 10:1
>- file-size			-10%

Ok. Smaller, faster. Readability?  That goes away right?  Also, 
without the global scale for those, all we know is that one 
function gets the 10:n advantage.   The question then is, how 
much does that matter given the resources available.  In the 
itty bitty bikini device, I can see it.   What about overall 
system reliability?  Better? Worse?

>There was no compression in the new format.  The original file was
>ASCII, and the 'binary' form was UTF-8, so these numbers are optimistic
>for non-Anglo centric documents.  I also had a version which used
>UTF-16, which was faster (15:1), but produced larger documents (+60%).
>These numbers are _very_ compelling, and I think are enough to warrant
>serious investigation, and possible standardization.  

Serious investigation is key absolutely.  Spec first.

>One of the significant aspects was that I could write a non-validating
>parser in less than a day.  Writing a fully conformant non-validating
>XML parser is a much harder task.  

This is too much like the DePH argument.  

So, for the programmer, this is a win in your estimation.  On the 
other hand, others in this thread claim that performance is not much 
improved for other than a minority of cases.

>There are disadvantages to this.
>This means that every product group this side of Pluto will author their
>own binary-xml parser, and many will be slightly non-conformant.  On the
>plus side, that means that every product group this side of Pluto will
>be using XML(-ish).  

More room to exOrXize and less stability in the content maintenance.  Again,
for the short 
lifecycle messages, this may be a win.

>If the format is extensible, so that it is possible
>to stick application specific blocks of data inline, then it will be
>much easier for groups to move from a purely proprietary solution to a
>XML-centric solution.

Ok.  But you said XML-ish and that gives me cause to shudder.   The pockets 
out here aren't necessarily deep enough to withstand another "simplification

for the DePH/Programmer" if that quickly expands into a system rewrite.  

It might be better to recover the costs of the last simplification first.  
If this works for the minority of cases, it 
might be preferable for those cases to create application language binaries 
and absorb the costs of their own niches rather than creating a requirement
for all implementors 
to absorb a cost they don't need.  It becomes like that really awful treaty
promoted that lets foreign nationals assert law enforcement requirements on 
member nations at will.


Policy laundering by those who think they need something that ends up 
being very expensive.  Just how deep into the infrastructure 
does one go to satisfy the needs of the DePH who it turns out, is a myth?

>One worry I do have about a standard, is that the format will bloat.  If
>a standardization group does form, it should be a hard limit that a
>parser for this new format could conform to the 10:2 ratio I mention
>above, or something close to it.  Feature creep is something which must
>be fought tooth and nail, or else there is no purpose to creating the
>new format.

A standardization group should not form.  That is policy laundering. 
A spec should be created, offered, and those who have a functional 
need can check the form and fit of the function offered.  IOW, no 
treaty, no enforcement.  Offer a choice of means.  If the means 
are acceptable to the majority, on to the standards group when 
it can go quickly rather than dragging on for two or three years 
while standards wonks design instead of ratify language.

>p.s. I am speaking in no official capacity when I say any of this,
>rather they are my personal opinions and should only be regarded as


Ekam sat.h, Vipraah bahudhaa vadanti.
Daamyata. Datta. Dayadhvam.h