xml-dev - Re: [xml-dev] Binary XML == "spawn of the devil" ?

Re: [xml-dev] Binary XML == "spawn of the devil" ?

[ Lists Home | Date Index | Thread Index ]

To: Liam Quin <liam@w3.org>
Subject: Re: [xml-dev] Binary XML == "spawn of the devil" ?
From: Rick Marshall <rjm@zenucom.com>
Date: 20 Aug 2003 08:48:00 +1000
Cc: Elliotte Rusty Harold <elharo@metalab.unc.edu>, xml-dev@lists.xml.org
In-reply-to: <20030819153128.GB5730@w3.org>
Organization: Zenucom Pty Limited
References: <oprstl350fezizxn@localhost> <p04330101bb5c449e6faa@[192.168.254.4]> <20030819153128.GB5730@w3.org>
Reply-to: rjm@zenucom.com

i've had a bit of a read of the sun paper on "Fast" xml and can see some
merit - not in binary xml - i've had some say on that, but on a layered
approach

it seems to me that most of the binary stuff will ultimately say "we
like the model, but not the readability requirement". ie the tags aren't
necessary, as tags

so perhaps what we could do is have the layers - like layer 0 is xml
1.0.

layer 1 is a binary transmission representation that preserves structure
- elements, attributes, entities, perhaps dtd's etc, but not the tags or
utf-8

layer 2 is a storage representation. it might also include tag
dictionaries, record pointers etc

the opening entry could then be something like '<?xml version="1.0"
layer="1"?>'

then we could build tools to convert between layers and write code that
assumes layers.

now the only outstanding issue to me is that xml data doesn't have
domains, as such. ie the data is not intrinsically numeric, ascii,
floating point etc. a program interprets the data as such. adding this
will to some extent break the independence of xml.

when you look at things like asn.1 it has extra features like data
domains and data extents (widths if you like) all of which are foreign
to xml as it stands.

this would then mean that every xml document must have, rather than can
have, some sort of definition such as a dtd to be valid for binary
operation.

not the end of the world, but does add to complexity and might create
"which document description is better" war.

which leaves us with gzip and friends.

cheers

rick

ps for the benefit of the sun and asn.1 people. i don't understand why
xml has to be everything and you can't just publish an xml to asn.1
translation standard for those who want to use such a thing? in fact
isn't that exactly what the itu did?

 On Wed, 2003-08-20 at 01:31, Liam Quin wrote:
> On Sun, Aug 10, 2003 at 03:09:13PM -0400, Elliotte Rusty Harold wrote:
> > One of the goals of some of the developers pushing binary XML is to 
> > speed up parsing, to provide some sort of preparsed format that is 
> > quicker to parse than real XML. I am extremely skeptical that this 
> > can be achieved in a platform-independent fashion.
> 
> That, in a nutshell, is why we're having a workshop.
> 
> The question is not, "can you get performance benefits or significant
> bandwidth savings using a more compact tramission method".
> 
> The question is, "Is there a trasmission/storage method that gives
> significant benefits but that is also interoperable across
> differing platforms, architectures and implementations, and for
> a wide cross-section of the entire XML industry."
> 
> Formats that hard-wire integer sizes in terms of octets, for example,
> obviusly have problems.
> 
> > the ideas for writing length codes into the data might help, though I 
> > doubt they help that much, or are robust in the face of data that 
> > violates the length codes.  Nonetheless this is at least plausible.
> 
> In the past I've used a compressed encoding for integers, rather like
> UTF-8 -- e.g. set the top bit on each octet if more data follows in
> the same number.  That way low number (<= 127) use a single byte,
> and larger numbers use more as needed; sometimes compression can
> be applied usefully to the result, too.  This does require decoding,
> which I suspect some people want to avoid, but I think it'll be
> necessary.  When I measured performance using this method (not
> for XML though) the I/O saving clearly beat out a text-based parse.
> It would depend on the data, I suspect -- as indeed you say.
> 
> > I do not accept as an axiom that binary formats are naturally
> > faster to parse than text formats.
> Agreed - some are and some aren't.
> 
> Again, the primary goal of this W3C Workshop is to explore whether
> we (W3C) should be doing stndards work in this area, especially given
> the fact that a number of organisations are already transmitting
> binary representations of XML Information Items, in a way that has
> zero interoperability.
> 
> I hope this helps make at least my position clearer.  Furthermore,
> as Robert Berjon has said, the proceedings will be made public.
> We can't force people to release data they used for benchmarks in
> their position papers, but we can ask that for any future work,
> they use public data, or create public data with properties
> similar to their private data.
> 
> A benchmark is only useful if it can be reproduced.
> 
> Best,
> 
> Liam

Follow-Ups:
- RE: [xml-dev] Binary XML == "spawn of the devil" ?
  - From: "Alessandro Triglia" <sandro@mclink.it>

References:
- Re: [xml-dev] Binary XML == "spawn of the devil" ?
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] Binary XML == "spawn of the devil" ?
  - From: Liam Quin <liam@w3.org>

Prev by Date: RE: [xml-dev] Postel's Law Has No Exceptions
Next by Date: RE: [xml-dev] Postel's Law Has No Exceptions
Previous by thread: Re: [xml-dev] Binary XML == "spawn of the devil" ?
Next by thread: RE: [xml-dev] Binary XML == "spawn of the devil" ?
Index(es):
- Date
- Thread