xml-dev - RE: [xml-dev] Binary XML == "spawn of the devil" ?

RE: [xml-dev] Binary XML == "spawn of the devil" ?

[ Lists Home | Date Index | Thread Index ]

To: <rjm@zenucom.com>, <xml-dev@lists.xml.org>
Subject: RE: [xml-dev] Binary XML == "spawn of the devil" ?
From: "Alessandro Triglia" <sandro@mclink.it>
Date: Tue, 19 Aug 2003 19:52:31 -0400
Importance: Normal
In-reply-to: <1061333280.4452.14.camel@znote.zenucom.com>



> -----Original Message-----
> From: Rick Marshall [mailto:rjm@zenucom.com] 
> Sent: Tuesday, August 19, 2003 18:48
> To: Liam Quin
> Cc: Elliotte Rusty Harold; xml-dev@lists.xml.org
> Subject: Re: [xml-dev] Binary XML == "spawn of the devil" ?
> 
> 
> i've had a bit of a read of the sun paper on "Fast" xml and 
> can see some merit - not in binary xml - i've had some say on 
> that, but on a layered approach
> 
> it seems to me that most of the binary stuff will ultimately 
> say "we like the model, but not the readability requirement". 
> ie the tags aren't necessary, as tags
> 
> so perhaps what we could do is have the layers - like layer 0 
> is xml 1.0.
> 
> layer 1 is a binary transmission representation that 
> preserves structure
> - elements, attributes, entities, perhaps dtd's etc, but not 
> the tags or utf-8
> 
> layer 2 is a storage representation. it might also include 
> tag dictionaries, record pointers etc
> 
> the opening entry could then be something like '<?xml 
> version="1.0" layer="1"?>'
> 
> then we could build tools to convert between layers and write 
> code that assumes layers.
> 
> now the only outstanding issue to me is that xml data doesn't 
> have domains, as such. ie the data is not intrinsically 
> numeric, ascii, floating point etc. a program interprets the 
> data as such. adding this will to some extent break the 
> independence of xml.
> 
> when you look at things like asn.1 it has extra features like 
> data domains and data extents (widths if you like) all of 
> which are foreign to xml as it stands.
> 
> this would then mean that every xml document must have, 
> rather than can have, some sort of definition such as a dtd 
> to be valid for binary operation.
> 
> not the end of the world, but does add to complexity and 
> might create "which document description is better" war.
> 
> which leaves us with gzip and friends.
> 
> cheers
> 
> rick
> 
> ps for the benefit of the sun and asn.1 people. i don't 
> understand why xml has to be everything and you can't just 
> publish an xml to asn.1 translation standard for those who 
> want to use such a thing? in fact isn't that exactly what the itu did?


Yes.  The XML-Schema-to-ASN.1 translation standard you mention is X.694 (aka
ISO/IEC 8825-5), currently at the end of its FCD ballot period in ISO.  

By applying X.694 to a schema, you get one or more ASN.1 modules that can
produce binary encodings.  These binary encodings are usually very fast and
compact.

The translation from XML Schema to ASN.1 specified in X.694 is canonical
with respect to all standard ASN.1 encoding rules.  This means that two
different implementations of X.694 will produce exactly the same binary
encodings, thus enabling interworking without a need to distribute the ASN.1
generated by the translation.  In other words, the translation can be
independently performed at each node and the nodes will interoperate in all
cases.

The end result is that binary data, equivalent to a well-formed *and* valid
instance (according to a source schema) can be exchanged, having an XML
Schema available at both sides.  This is one possible solution of the
"binary infoset" problem.

Alessandro Triglia
OSS Nokalva


> 
>  On Wed, 2003-08-20 at 01:31, Liam Quin wrote:
> > On Sun, Aug 10, 2003 at 03:09:13PM -0400, Elliotte Rusty 
> Harold wrote:
> > > One of the goals of some of the developers pushing binary 
> XML is to
> > > speed up parsing, to provide some sort of preparsed 
> format that is 
> > > quicker to parse than real XML. I am extremely skeptical 
> that this 
> > > can be achieved in a platform-independent fashion.
> > 
> > That, in a nutshell, is why we're having a workshop.
> > 
> > The question is not, "can you get performance benefits or 
> significant 
> > bandwidth savings using a more compact tramission method".
> > 
> > The question is, "Is there a trasmission/storage method that gives 
> > significant benefits but that is also interoperable across 
> differing 
> > platforms, architectures and implementations, and for a wide 
> > cross-section of the entire XML industry."
> > 
> > Formats that hard-wire integer sizes in terms of octets, 
> for example, 
> > obviusly have problems.
> > 
> > > the ideas for writing length codes into the data might 
> help, though 
> > > I
> > > doubt they help that much, or are robust in the face of data that 
> > > violates the length codes.  Nonetheless this is at least 
> plausible.
> > 
> > In the past I've used a compressed encoding for integers, 
> rather like 
> > UTF-8 -- e.g. set the top bit on each octet if more data follows in 
> > the same number.  That way low number (<= 127) use a single 
> byte, and 
> > larger numbers use more as needed; sometimes compression can be 
> > applied usefully to the result, too.  This does require decoding, 
> > which I suspect some people want to avoid, but I think it'll be 
> > necessary.  When I measured performance using this method 
> (not for XML 
> > though) the I/O saving clearly beat out a text-based parse. 
> It would 
> > depend on the data, I suspect -- as indeed you say.
> > 
> > > I do not accept as an axiom that binary formats are 
> naturally faster 
> > > to parse than text formats.
> > Agreed - some are and some aren't.
> > 
> > Again, the primary goal of this W3C Workshop is to explore 
> whether we 
> > (W3C) should be doing stndards work in this area, 
> especially given the 
> > fact that a number of organisations are already transmitting binary 
> > representations of XML Information Items, in a way that has zero 
> > interoperability.
> > 
> > I hope this helps make at least my position clearer.  
> Furthermore, as 
> > Robert Berjon has said, the proceedings will be made 
> public. We can't 
> > force people to release data they used for benchmarks in their 
> > position papers, but we can ask that for any future work, they use 
> > public data, or create public data with properties similar to their 
> > private data.
> > 
> > A benchmark is only useful if it can be reproduced.
> > 
> > Best,
> > 
> > Liam
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org 
> <http://www.xml.org>, an initiative of OASIS 
<http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>

Follow-Ups:
- Re: [xml-dev] Binary XML == "spawn of the devil" ?
  - From: "Stephen D. Williams" <sdw@lig.net>

References:
- Re: [xml-dev] Binary XML == "spawn of the devil" ?
  - From: Rick Marshall <rjm@zenucom.com>

Prev by Date: Re: [xml-dev] A standard approach to glueing together reusableXML fragments in prose?
Next by Date: Re: [xml-dev] Are we ready for the namespace ID registry, yet? (w as: RelaxNG question)
Previous by thread: Re: [xml-dev] Binary XML == "spawn of the devil" ?
Next by thread: Re: [xml-dev] Binary XML == "spawn of the devil" ?
Index(es):
- Date
- Thread