[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [Xml-bin] RE: Another binary XML approach
- From: "Bullard, Claude L (Len)" <clbullar@ingr.com>
- To: Derek Denny-Brown <derekdb@microsoft.com>
- Date: Fri, 13 Apr 2001 09:26:57 -0500
From: Derek Denny-Brown [mailto:derekdb@microsoft.com]
>I'll finally chime, in on this issue, since it seems to have fallen down
>to a question of how worthwhile binary/tokenized XML really is.
Thanks, Derek.
>>avoid reinventing the wheel
>Reinventing the Wheel, is always an issue, but there is a time and a
>place. Why XML at all since you could just use SGML?
Good question. XML is SGML as practiced. I think it had something
to do with the W3C running the show instead of ISO. This thread seems
to be moving the shoe back. We start out with a technical requirement
and end up sucked into a big power play. Life is too short...
>To steal from Tim
>Bray's recent hit/miss presentation; one good reason to reinvent is to
>adjust an existing standard to better 'hit' the 80% that matters.
Maybe this binary thing is next year's mega hit. I don't know but I learned
long ago to dismiss HitHype as DJ rant. Good for the guy spinning
the record and the artist the indie paid him to spin; noise to the
listeners.
>>embedded devices, high-volume transactions, efficiency, compression
>>ratio,
>I'll rephrase this in a form that includes more quantifiable items:
>- parser size/complexity
>- parse time
>- file-size
Good for the function. Form and fit are still missing. I think that
is what Rick Jeliffe is describing. The ubiquity of the system
depends on fitting the layers appropriately given the requirements
of each layer both in the perspective of the medium and the use.
>In a prototype of a tokenized XML format, these results came out to
>approximately
>- parser size/complexity roughly 10:2
>- parse time roughly 10:1
>- file-size -10%
Ok. Smaller, faster. Readability? That goes away right? Also,
without the global scale for those, all we know is that one
function gets the 10:n advantage. The question then is, how
much does that matter given the resources available. In the
itty bitty bikini device, I can see it. What about overall
system reliability? Better? Worse?
>There was no compression in the new format. The original file was
>ASCII, and the 'binary' form was UTF-8, so these numbers are optimistic
>for non-Anglo centric documents. I also had a version which used
>UTF-16, which was faster (15:1), but produced larger documents (+60%).
>These numbers are _very_ compelling, and I think are enough to warrant
>serious investigation, and possible standardization.
Serious investigation is key absolutely. Spec first.
>One of the significant aspects was that I could write a non-validating
>parser in less than a day. Writing a fully conformant non-validating
>XML parser is a much harder task.
This is too much like the DePH argument.
So, for the programmer, this is a win in your estimation. On the
other hand, others in this thread claim that performance is not much
improved for other than a minority of cases.
>There are disadvantages to this.
>This means that every product group this side of Pluto will author their
>own binary-xml parser, and many will be slightly non-conformant. On the
>plus side, that means that every product group this side of Pluto will
>be using XML(-ish).
More room to exOrXize and less stability in the content maintenance. Again,
for the short
lifecycle messages, this may be a win.
>If the format is extensible, so that it is possible
>to stick application specific blocks of data inline, then it will be
>much easier for groups to move from a purely proprietary solution to a
>XML-centric solution.
Ok. But you said XML-ish and that gives me cause to shudder. The pockets
out here aren't necessarily deep enough to withstand another "simplification
for the DePH/Programmer" if that quickly expands into a system rewrite.
It might be better to recover the costs of the last simplification first.
If this works for the minority of cases, it
might be preferable for those cases to create application language binaries
and absorb the costs of their own niches rather than creating a requirement
for all implementors
to absorb a cost they don't need. It becomes like that really awful treaty
being
promoted that lets foreign nationals assert law enforcement requirements on
member nations at will.
http://www.law.com/cgi-bin/gx.cgi/AppLogic+FTContentServer?pagename=law/View
&c=Article&cid=ZZZD3WRL5LC&live=true&cst=1&pc=0&pa=0&s=News&ExpIgnore=true&s
howsummary=0
Policy laundering by those who think they need something that ends up
being very expensive. Just how deep into the infrastructure
does one go to satisfy the needs of the DePH who it turns out, is a myth?
>One worry I do have about a standard, is that the format will bloat. If
>a standardization group does form, it should be a hard limit that a
>parser for this new format could conform to the 10:2 ratio I mention
>above, or something close to it. Feature creep is something which must
>be fought tooth and nail, or else there is no purpose to creating the
>new format.
A standardization group should not form. That is policy laundering.
A spec should be created, offered, and those who have a functional
need can check the form and fit of the function offered. IOW, no
treaty, no enforcement. Offer a choice of means. If the means
are acceptable to the majority, on to the standards group when
it can go quickly rather than dragging on for two or three years
while standards wonks design instead of ratify language.
>p.s. I am speaking in no official capacity when I say any of this,
>rather they are my personal opinions and should only be regarded as
>such.
Understood.
Len
http://www.mp3.com/LenBullard
Ekam sat.h, Vipraah bahudhaa vadanti.
Daamyata. Datta. Dayadhvam.h