Lists Home |
Date Index |
On Sun, Aug 10, 2003 at 03:09:13PM -0400, Elliotte Rusty Harold wrote:
> One of the goals of some of the developers pushing binary XML is to
> speed up parsing, to provide some sort of preparsed format that is
> quicker to parse than real XML. I am extremely skeptical that this
> can be achieved in a platform-independent fashion.
That, in a nutshell, is why we're having a workshop.
The question is not, "can you get performance benefits or significant
bandwidth savings using a more compact tramission method".
The question is, "Is there a trasmission/storage method that gives
significant benefits but that is also interoperable across
differing platforms, architectures and implementations, and for
a wide cross-section of the entire XML industry."
Formats that hard-wire integer sizes in terms of octets, for example,
obviusly have problems.
> the ideas for writing length codes into the data might help, though I
> doubt they help that much, or are robust in the face of data that
> violates the length codes. Nonetheless this is at least plausible.
In the past I've used a compressed encoding for integers, rather like
UTF-8 -- e.g. set the top bit on each octet if more data follows in
the same number. That way low number (<= 127) use a single byte,
and larger numbers use more as needed; sometimes compression can
be applied usefully to the result, too. This does require decoding,
which I suspect some people want to avoid, but I think it'll be
necessary. When I measured performance using this method (not
for XML though) the I/O saving clearly beat out a text-based parse.
It would depend on the data, I suspect -- as indeed you say.
> I do not accept as an axiom that binary formats are naturally
> faster to parse than text formats.
Agreed - some are and some aren't.
Again, the primary goal of this W3C Workshop is to explore whether
we (W3C) should be doing stndards work in this area, especially given
the fact that a number of organisations are already transmitting
binary representations of XML Information Items, in a way that has
I hope this helps make at least my position clearer. Furthermore,
as Robert Berjon has said, the proceedings will be made public.
We can't force people to release data they used for benchmarks in
their position papers, but we can ask that for any future work,
they use public data, or create public data with properties
similar to their private data.
A benchmark is only useful if it can be reproduced.
Liam Quin, W3C XML Activity Lead, email@example.com, http://www.w3.org/People/Quin/
Ankh's list of IRC clients: http://www.valinor.sorcery.net/clients/