Lists Home |
Date Index |
On Mon, 07 Feb 2005 17:35:12 -0500, Elliotte Harold
> Binary file formats that call themselves XML, binary encodings of the
> XML infoset, and the like, are broken and actively damaging to the XML
Let me summarize what I took away from numerous presentations and
discussions of this subject at XML 2004.
- There is a community of people who wish to leverage much of what the
world thinks of as "XML", including SAX, DOM, XSLT, XSD, and the
software and documentation support these things, but finds that in
practice the XML syntax is too verbose (and/or resource intensive to
process) in their domain. The most obvious example is wireless.
There was a gentleman there from an Air Force contractor who made the
point very clearly that the USAF would love to use XMLalmost
everywhere for the enormous cost savings and quality improvements it
offers over the current chaos of binary formats, APIs, and expensive
hyper-specialsts. BUT XML text is 10-100 times as verbose as these
(highly optimized for low-bandwidth communications) formats.
Reinventing all the supporting specs and tools for their domain would
be pointless, since SAX/XSLT/etc. do the job; what they want is simply
an optimized serialization of the XML Infoset that the other specs
- There is a school of thought that a binary format would fit into
"real" XML quite cleanly as a specialized encoding. I don't know
enough about the deeper philosophy of the XML spec to know if this is
a shameless exploitation of an ambiguity or a clever hack to do
something unanticipated but well within the spirit of the thing. I
personally don't see how a document
<?xml version="1.0" encoding="Shift-JIS"?>
[binary gibberish I have no software to process but others do]
is qualitatively less correct or interoperable than
<?xml version="1.0" encoding="W3CBinaryXML"?>
[binary gibberish I have no software to process but others will]
- Binary serializations of the XML infoset have already been created
that are are capable of pretty decent compression or parsing
performance. See the citations in the XML 2004 papers that are
online. There are plenty of academic and quasi-academic papers on
this. The interesting question is whether any can get sufficiently
better compression AND performance (and a bunch of other attributes)
than XML text to make it worthwhile for a wide range of uses. The
Binary XML Characterization WG is defining the criteria by which this
might be determined.
- "Binary XML" is happening, whether that is an oxymoron it or not.
There are well over a dozen format proposals that have been made
publicly available, and probably dozens more that have not. For
example, I recall Michael Rys saying at XML 2003 that SQL Server 2005
uses a proprietary binary encoding internally to store XML compactly
and in a way that is efficiently processed with XML APIs or serialized
into XML text. I suspect that many other XML DBs do something
similar. I believe that some XML hardware middleware vendors do as
well. Many of these are conceptually serializations of SAX event
streams, so they have a deep "XML" heritage and easy integration with
applications that work with SAX parsers.
- The really contentious issue is whether one or more of these formats
should be standardized, and who should do the standardization (e.g.
W3C or the wireless industry). Alternatively, they might be best
hidden within implementations of the existing standards, with XML 1.0
the norm when interoperability is needed, but all sorts of things will
be on the wire in more tightly coupled environments.
- Another point of contention is whether a binary XML encoding would
undermine or enhance XML's interoperability and ubiquity. Elliotte,
Uche, and others have vociferously made the "actively damaging to XML"
argument; the other side argues that XML is *not* ubiquitous in the
(rapidly growing) wireless domain and will not be until the efficiency
problem is addressed. The wireless industry wants a W3C standard so
that there is a single wireless Web rather than one fragmented across
vendors and sites that support one or another.