OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Binary XML == "spawn of the devil" ?

[ Lists Home | Date Index | Thread Index ]

I have a constraint that bars this solution: data must be self-describing and not require specific pre-defined schema due to the versioning/schema migration issues that this requires.  There are many circumstances where requiring a pre-arranged schema and derived codec engine is not useful.

I favor a template-delta approach, somewhat similar in concept to slip/ppp "header compression".

sdw

Alessandro Triglia wrote:
  
-----Original Message-----
From: Rick Marshall [mailto:rjm@zenucom.com] 
Sent: Tuesday, August 19, 2003 18:48
To: Liam Quin
Cc: Elliotte Rusty Harold; xml-dev@lists.xml.org
Subject: Re: [xml-dev] Binary XML == "spawn of the devil" ?


i've had a bit of a read of the sun paper on "Fast" xml and 
can see some merit - not in binary xml - i've had some say on 
that, but on a layered approach

it seems to me that most of the binary stuff will ultimately 
say "we like the model, but not the readability requirement". 
ie the tags aren't necessary, as tags

so perhaps what we could do is have the layers - like layer 0 
is xml 1.0.

layer 1 is a binary transmission representation that 
preserves structure
- elements, attributes, entities, perhaps dtd's etc, but not 
the tags or utf-8

layer 2 is a storage representation. it might also include 
tag dictionaries, record pointers etc

the opening entry could then be something like '<?xml 
version="1.0" layer="1"?>'

then we could build tools to convert between layers and write 
code that assumes layers.

now the only outstanding issue to me is that xml data doesn't 
have domains, as such. ie the data is not intrinsically 
numeric, ascii, floating point etc. a program interprets the 
data as such. adding this will to some extent break the 
independence of xml.

when you look at things like asn.1 it has extra features like 
data domains and data extents (widths if you like) all of 
which are foreign to xml as it stands.

this would then mean that every xml document must have, 
rather than can have, some sort of definition such as a dtd 
to be valid for binary operation.

not the end of the world, but does add to complexity and 
might create "which document description is better" war.

which leaves us with gzip and friends.

cheers

rick

ps for the benefit of the sun and asn.1 people. i don't 
understand why xml has to be everything and you can't just 
publish an xml to asn.1 translation standard for those who 
want to use such a thing? in fact isn't that exactly what the itu did?
    


Yes.  The XML-Schema-to-ASN.1 translation standard you mention is X.694 (aka
ISO/IEC 8825-5), currently at the end of its FCD ballot period in ISO.  

By applying X.694 to a schema, you get one or more ASN.1 modules that can
produce binary encodings.  These binary encodings are usually very fast and
compact.

The translation from XML Schema to ASN.1 specified in X.694 is canonical
with respect to all standard ASN.1 encoding rules.  This means that two
different implementations of X.694 will produce exactly the same binary
encodings, thus enabling interworking without a need to distribute the ASN.1
generated by the translation.  In other words, the translation can be
independently performed at each node and the nodes will interoperate in all
cases.

The end result is that binary data, equivalent to a well-formed *and* valid
instance (according to a source schema) can be exchanged, having an XML
Schema available at both sides.  This is one possible solution of the
"binary infoset" problem.

Alessandro Triglia
OSS Nokalva


  
 On Wed, 2003-08-20 at 01:31, Liam Quin wrote:
    
On Sun, Aug 10, 2003 at 03:09:13PM -0400, Elliotte Rusty 
      
Harold wrote:
    
One of the goals of some of the developers pushing binary 
        
XML is to
    
speed up parsing, to provide some sort of preparsed 
        
format that is 
    
quicker to parse than real XML. I am extremely skeptical 
        
that this 
    
can be achieved in a platform-independent fashion.
        
That, in a nutshell, is why we're having a workshop.

The question is not, "can you get performance benefits or 
      
significant 
    
bandwidth savings using a more compact tramission method".

The question is, "Is there a trasmission/storage method that gives 
significant benefits but that is also interoperable across 
      
differing 
    
platforms, architectures and implementations, and for a wide 
cross-section of the entire XML industry."

Formats that hard-wire integer sizes in terms of octets, 
      
for example, 
    
obviusly have problems.

      
the ideas for writing length codes into the data might 
        
help, though 
    
I
doubt they help that much, or are robust in the face of data that 
violates the length codes.  Nonetheless this is at least 
        
plausible.
    
In the past I've used a compressed encoding for integers, 
      
rather like 
    
UTF-8 -- e.g. set the top bit on each octet if more data follows in 
the same number.  That way low number (<= 127) use a single 
      
byte, and 
    
larger numbers use more as needed; sometimes compression can be 
applied usefully to the result, too.  This does require decoding, 
which I suspect some people want to avoid, but I think it'll be 
necessary.  When I measured performance using this method 
      
(not for XML 
    
though) the I/O saving clearly beat out a text-based parse. 
      
It would 
    
depend on the data, I suspect -- as indeed you say.

      
I do not accept as an axiom that binary formats are 
        
naturally faster 
    
to parse than text formats.
        
Agreed - some are and some aren't.

Again, the primary goal of this W3C Workshop is to explore 
      
whether we 
    
(W3C) should be doing stndards work in this area, 
      
especially given the 
    
fact that a number of organisations are already transmitting binary 
representations of XML Information Items, in a way that has zero 
interoperability.

I hope this helps make at least my position clearer.  
      
Furthermore, as 
    
Robert Berjon has said, the proceedings will be made 
      
public. We can't 
    
force people to release data they used for benchmarks in their 
position papers, but we can ask that for any future work, they use 
public data, or create public data with properties similar to their 
private data.

A benchmark is only useful if it can be reproduced.

Best,

Liam
      
-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org 
<http://www.xml.org>, an initiative of OASIS 
    
<http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>



-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>
  

-- 
swilliams@hpti.com http://www.hpti.com  Personal: sdw@lig.net http://sdw.st
Stephen D. Williams 43392 Wayside Cir,Ashburn,VA 20147-4622
703-724-0118W 703-995-0407Fax Oct2002




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS