xml-dev - Re: [xml-dev] Binary XML == "spawn of the devil" ?

Re: [xml-dev] Binary XML == "spawn of the devil" ?

[ Lists Home | Date Index | Thread Index ]

To: Alessandro Triglia <sandro@mclink.it>
Subject: Re: [xml-dev] Binary XML == "spawn of the devil" ?
From: "Stephen D. Williams" <sdw@lig.net>
Date: Fri, 22 Aug 2003 12:21:33 -0400
Cc: rjm@zenucom.com, xml-dev@lists.xml.org
In-reply-to: <000e01c366ac$f54bf5f0$7888a1ac@aldebaran>
References: <000e01c366ac$f54bf5f0$7888a1ac@aldebaran>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)

I have a constraint that bars this solution: data must be self-describing and not require specific pre-defined schema due to the versioning/schema migration issues that this requires. There are many circumstances where requiring a pre-arranged schema and derived codec engine is not useful.

I favor a template-delta approach, somewhat similar in concept to slip/ppp "header compression".

sdw

Alessandro Triglia wrote:

-----Original Message-----
From: Rick Marshall [mailto:rjm@zenucom.com] 
Sent: Tuesday, August 19, 2003 18:48
To: Liam Quin
Cc: Elliotte Rusty Harold; xml-dev@lists.xml.org
Subject: Re: [xml-dev] Binary XML == "spawn of the devil" ?

i've had a bit of a read of the sun paper on "Fast" xml and 
can see some merit - not in binary xml - i've had some say on 
that, but on a layered approach

it seems to me that most of the binary stuff will ultimately 
say "we like the model, but not the readability requirement". 
ie the tags aren't necessary, as tags

so perhaps what we could do is have the layers - like layer 0 
is xml 1.0.

layer 1 is a binary transmission representation that 
preserves structure
- elements, attributes, entities, perhaps dtd's etc, but not 
the tags or utf-8

layer 2 is a storage representation. it might also include 
tag dictionaries, record pointers etc

the opening entry could then be something like '<?xml 
version="1.0" layer="1"?>'

then we could build tools to convert between layers and write 
code that assumes layers.

now the only outstanding issue to me is that xml data doesn't 
have domains, as such. ie the data is not intrinsically 
numeric, ascii, floating point etc. a program interprets the 
data as such. adding this will to some extent break the 
independence of xml.

when you look at things like asn.1 it has extra features like 
data domains and data extents (widths if you like) all of 
which are foreign to xml as it stands.

this would then mean that every xml document must have, 
rather than can have, some sort of definition such as a dtd 
to be valid for binary operation.

not the end of the world, but does add to complexity and 
might create "which document description is better" war.

which leaves us with gzip and friends.

cheers

rick

ps for the benefit of the sun and asn.1 people. i don't 
understand why xml has to be everything and you can't just 
publish an xml to asn.1 translation standard for those who 
want to use such a thing? in fact isn't that exactly what the itu did?



Yes.  The XML-Schema-to-ASN.1 translation standard you mention is X.694 (aka
ISO/IEC 8825-5), currently at the end of its FCD ballot period in ISO.  

By applying X.694 to a schema, you get one or more ASN.1 modules that can
produce binary encodings.  These binary encodings are usually very fast and
compact.

The translation from XML Schema to ASN.1 specified in X.694 is canonical
with respect to all standard ASN.1 encoding rules.  This means that two
different implementations of X.694 will produce exactly the same binary
encodings, thus enabling interworking without a need to distribute the ASN.1
generated by the translation.  In other words, the translation can be
independently performed at each node and the nodes will interoperate in all
cases.

The end result is that binary data, equivalent to a well-formed *and* valid
instance (according to a source schema) can be exchanged, having an XML
Schema available at both sides.  This is one possible solution of the
"binary infoset" problem.

Alessandro Triglia
OSS Nokalva

 On Wed, 2003-08-20 at 01:31, Liam Quin wrote:

On Sun, Aug 10, 2003 at 03:09:13PM -0400, Elliotte Rusty

Harold wrote:

One of the goals of some of the developers pushing binary

XML is to

speed up parsing, to provide some sort of preparsed

format that is

quicker to parse than real XML. I am extremely skeptical

that this

can be achieved in a platform-independent fashion.

That, in a nutshell, is why we're having a workshop.

The question is not, "can you get performance benefits or

significant

bandwidth savings using a more compact tramission method".

The question is, "Is there a trasmission/storage method that gives 
significant benefits but that is also interoperable across

differing

platforms, architectures and implementations, and for a wide 
cross-section of the entire XML industry."

Formats that hard-wire integer sizes in terms of octets,

for example,

obviusly have problems.

the ideas for writing length codes into the data might

help, though

I
doubt they help that much, or are robust in the face of data that 
violates the length codes.  Nonetheless this is at least

plausible.

In the past I've used a compressed encoding for integers,

rather like

UTF-8 -- e.g. set the top bit on each octet if more data follows in 
the same number.  That way low number (<= 127) use a single

byte, and

larger numbers use more as needed; sometimes compression can be 
applied usefully to the result, too.  This does require decoding, 
which I suspect some people want to avoid, but I think it'll be 
necessary.  When I measured performance using this method

(not for XML

though) the I/O saving clearly beat out a text-based parse.

It would

depend on the data, I suspect -- as indeed you say.

I do not accept as an axiom that binary formats are

naturally faster

to parse than text formats.

Agreed - some are and some aren't.

Again, the primary goal of this W3C Workshop is to explore

whether we

(W3C) should be doing stndards work in this area,

especially given the

fact that a number of organisations are already transmitting binary 
representations of XML Information Items, in a way that has zero 
interoperability.

I hope this helps make at least my position clearer.

Furthermore, as

Robert Berjon has said, the proceedings will be made

public. We can't

force people to release data they used for benchmarks in their 
position papers, but we can ask that for any future work, they use 
public data, or create public data with properties similar to their 
private data.

A benchmark is only useful if it can be reproduced.

Best,

Liam

-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org 
<http://www.xml.org>, an initiative of OASIS

<http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>



-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>

-- 
swilliams@hpti.com http://www.hpti.com  Personal: sdw@lig.net http://sdw.st
Stephen D. Williams 43392 Wayside Cir,Ashburn,VA 20147-4622
703-724-0118W 703-995-0407Fax Oct2002

Follow-Ups:
- Re: [xml-dev] Binary XML == "spawn of the devil" ?
  - From: Robin Berjon <robin.berjon@expway.fr>

References:
- RE: [xml-dev] Binary XML == "spawn of the devil" ?
  - From: "Alessandro Triglia" <sandro@mclink.it>

Prev by Date: RE: [xml-dev] InkML
Next by Date: RE: [xml-dev] InkML
Previous by thread: RE: [xml-dev] Binary XML == "spawn of the devil" ?
Next by thread: Re: [xml-dev] Binary XML == "spawn of the devil" ?
Index(es):
- Date
- Thread