xml-dev - RE: [xml-dev] Microsoft FUD on binary XML...

RE: [xml-dev] Microsoft FUD on binary XML...

[ Lists Home | Date Index | Thread Index ]

To: "Michael Champion" <mc@xegesis.org>,<xml-dev@lists.xml.org>
Subject: RE: [xml-dev] Microsoft FUD on binary XML...
From: "Joshua Allen" <joshuaa@microsoft.com>
Date: Tue, 18 Nov 2003 16:49:56 -0800
Thread-index: AcOuMmILH2v7529KSZWl29Cgb3n3PQAAuIKg
Thread-topic: [xml-dev] Microsoft FUD on binary XML...

> -----Original Message-----
> From: Michael Champion [mailto:mc@xegesis.org]
> Sent: Tuesday, November 18, 2003 4:15 PM
> To: <xml-dev@lists.xml.org> <xml-dev@lists.xml.org>
> Subject: Re: [xml-dev] Microsoft FUD on binary XML...
> 
> 
> On Nov 18, 2003, at 6:32 PM, Joshua Allen wrote:
> 
> > - people gripe
> > about parse speed of XML as if it will be faster when it's binary,
and
> > I
> > think this is incorrect from two perspectives -- first is that we
have
> > shown XML-oriented protocols to be faster than binary in many cases,
> > and
> > second because there is still tons of room for improvement in text
> > parsing speeds (the fact that gen 1 of XML parsers is slow simply
> > proves
> > that they are gen 1 parsers, not that text is inherently slower than
> > binary).
> >
> Hmm.  At the binary  XML workshop [yeah, yeah, "binary serialization
of
> the XML Infoset"] , lots of people were talking about XML being 10x
> slower than comparable binary technologies.  (Mind you, I personally
> think this is a very reasonable price to pay  in most circumstances,
> but I would like to get the facts straight).
> 
> Can you point to anything public that shows that XML-oriented
protocols
> to be faster than binary?  Again I agree that XML parsing is seldom a
> bottleneck, so XML *applications* are often just as fast as binary
> ones, but I'm not so sure about "protocols."
> 

Well, as I understand the examples I've seen, some of the problems with
these binary formats is that they tended to evolve, and the new
information embedded in the binary stream is not always added in a way
that is most amenable to efficient parsing.  So the idea of "fast
binary" made sense at V1 of the protocol, but things become a mess over
time and much of the supposed benefits of binary parsing are reduced by
all the special-casing code.

> On the "gen 1-ness" of XML parsers, that was a very good point I
> learned about at the workshop. On the other hand, many of the
> optimizations to produce significant speedups depend on a shared
> schema.  I have a philosophical question:  If an XML distributed

I'm definitely not talking about optimizations that take a shared
schema.  We got pretty dramatic improvements in parse speed between V1
and V2 of our parsers (caveat, not shipping, so no official promises,
but...)  And this is pure text parsing; and as far as I have seen, our
V1 parser was pretty fast compared to most of the parsers out there that
people complain about being slow.

> just tend to see XML as *only* an interchange format between
databases,
> objects and applications rather than something that would be natively
> stored, processed, displayed, etc. in an application-neutral or
> schema-neutral manner.  I suspect the party line will change when
WinFS

Well, hold on -- you are aware that MSFT has been shipping a
parse-speed-optimized binary format for XML since SQL 2000 shipped, so
it is not as if MSFT is totally opposed to native binary encoding of
XML.  My point is simply that "native binary encoding" is not an interop
scenario.  It's a very valid issue for specific client scenarios, server
scenarios, and so on -- but it's totally counterproductive to interop.
To me the choice of native storage format and encoding is very specific
to an application scenario and will always be optimized regardless of
what standards anyone tries to invent anyway.  For example, I have seen
people recently asserting that one should design exclusively for REST
communications with a local storage engine, which IMO is just insane --
REST is great for interop, but on my local box I want tight-coupled.
Same goes for XML-text; it is great for interop, great for hierarchical
access to data, semistructured/document data and so on -- but sometimes
you want something more tightly-coupled.  Nothing worng with that, but
just don't call it interop.

Follow-Ups:
- Re: [xml-dev] Microsoft FUD on binary XML...
  - From: Michael Champion <mc@xegesis.org>

Prev by Date: RE: [xml-dev] Word 2003 schemas available
Next by Date: DDF, ISO 8211
Previous by thread: Re: [xml-dev] Microsoft FUD on binary XML...
Next by thread: Re: [xml-dev] Microsoft FUD on binary XML...
Index(es):
- Date
- Thread