Lists Home |
Date Index |
i've had a bit of a read of the sun paper on "Fast" xml and can see some
merit - not in binary xml - i've had some say on that, but on a layered
it seems to me that most of the binary stuff will ultimately say "we
like the model, but not the readability requirement". ie the tags aren't
necessary, as tags
so perhaps what we could do is have the layers - like layer 0 is xml
layer 1 is a binary transmission representation that preserves structure
- elements, attributes, entities, perhaps dtd's etc, but not the tags or
layer 2 is a storage representation. it might also include tag
dictionaries, record pointers etc
the opening entry could then be something like '<?xml version="1.0"
then we could build tools to convert between layers and write code that
now the only outstanding issue to me is that xml data doesn't have
domains, as such. ie the data is not intrinsically numeric, ascii,
floating point etc. a program interprets the data as such. adding this
will to some extent break the independence of xml.
when you look at things like asn.1 it has extra features like data
domains and data extents (widths if you like) all of which are foreign
to xml as it stands.
this would then mean that every xml document must have, rather than can
have, some sort of definition such as a dtd to be valid for binary
not the end of the world, but does add to complexity and might create
"which document description is better" war.
which leaves us with gzip and friends.
ps for the benefit of the sun and asn.1 people. i don't understand why
xml has to be everything and you can't just publish an xml to asn.1
translation standard for those who want to use such a thing? in fact
isn't that exactly what the itu did?
On Wed, 2003-08-20 at 01:31, Liam Quin wrote:
> On Sun, Aug 10, 2003 at 03:09:13PM -0400, Elliotte Rusty Harold wrote:
> > One of the goals of some of the developers pushing binary XML is to
> > speed up parsing, to provide some sort of preparsed format that is
> > quicker to parse than real XML. I am extremely skeptical that this
> > can be achieved in a platform-independent fashion.
> That, in a nutshell, is why we're having a workshop.
> The question is not, "can you get performance benefits or significant
> bandwidth savings using a more compact tramission method".
> The question is, "Is there a trasmission/storage method that gives
> significant benefits but that is also interoperable across
> differing platforms, architectures and implementations, and for
> a wide cross-section of the entire XML industry."
> Formats that hard-wire integer sizes in terms of octets, for example,
> obviusly have problems.
> > the ideas for writing length codes into the data might help, though I
> > doubt they help that much, or are robust in the face of data that
> > violates the length codes. Nonetheless this is at least plausible.
> In the past I've used a compressed encoding for integers, rather like
> UTF-8 -- e.g. set the top bit on each octet if more data follows in
> the same number. That way low number (<= 127) use a single byte,
> and larger numbers use more as needed; sometimes compression can
> be applied usefully to the result, too. This does require decoding,
> which I suspect some people want to avoid, but I think it'll be
> necessary. When I measured performance using this method (not
> for XML though) the I/O saving clearly beat out a text-based parse.
> It would depend on the data, I suspect -- as indeed you say.
> > I do not accept as an axiom that binary formats are naturally
> > faster to parse than text formats.
> Agreed - some are and some aren't.
> Again, the primary goal of this W3C Workshop is to explore whether
> we (W3C) should be doing stndards work in this area, especially given
> the fact that a number of organisations are already transmitting
> binary representations of XML Information Items, in a way that has
> zero interoperability.
> I hope this helps make at least my position clearer. Furthermore,
> as Robert Berjon has said, the proceedings will be made public.
> We can't force people to release data they used for benchmarks in
> their position papers, but we can ask that for any future work,
> they use public data, or create public data with properties
> similar to their private data.
> A benchmark is only useful if it can be reproduced.