I have a constraint that bars this solution: data must be
self-describing and not require specific pre-defined schema due to the
versioning/schema migration issues that this requires. There are many
circumstances where requiring a pre-arranged schema and derived codec
engine is not useful.
I favor a template-delta approach, somewhat similar in concept to
slip/ppp "header compression".
sdw
Alessandro Triglia wrote:
-----Original Message-----
From: Rick Marshall [mailto:rjm@zenucom.com]
Sent: Tuesday, August 19, 2003 18:48
To: Liam Quin
Cc: Elliotte Rusty Harold; xml-dev@lists.xml.org
Subject: Re: [xml-dev] Binary XML == "spawn of the devil" ?
i've had a bit of a read of the sun paper on "Fast" xml and
can see some merit - not in binary xml - i've had some say on
that, but on a layered approach
it seems to me that most of the binary stuff will ultimately
say "we like the model, but not the readability requirement".
ie the tags aren't necessary, as tags
so perhaps what we could do is have the layers - like layer 0
is xml 1.0.
layer 1 is a binary transmission representation that
preserves structure
- elements, attributes, entities, perhaps dtd's etc, but not
the tags or utf-8
layer 2 is a storage representation. it might also include
tag dictionaries, record pointers etc
the opening entry could then be something like '<?xml
version="1.0" layer="1"?>'
then we could build tools to convert between layers and write
code that assumes layers.
now the only outstanding issue to me is that xml data doesn't
have domains, as such. ie the data is not intrinsically
numeric, ascii, floating point etc. a program interprets the
data as such. adding this will to some extent break the
independence of xml.
when you look at things like asn.1 it has extra features like
data domains and data extents (widths if you like) all of
which are foreign to xml as it stands.
this would then mean that every xml document must have,
rather than can have, some sort of definition such as a dtd
to be valid for binary operation.
not the end of the world, but does add to complexity and
might create "which document description is better" war.
which leaves us with gzip and friends.
cheers
rick
ps for the benefit of the sun and asn.1 people. i don't
understand why xml has to be everything and you can't just
publish an xml to asn.1 translation standard for those who
want to use such a thing? in fact isn't that exactly what the itu did?
Yes. The XML-Schema-to-ASN.1 translation standard you mention is X.694 (aka
ISO/IEC 8825-5), currently at the end of its FCD ballot period in ISO.
By applying X.694 to a schema, you get one or more ASN.1 modules that can
produce binary encodings. These binary encodings are usually very fast and
compact.
The translation from XML Schema to ASN.1 specified in X.694 is canonical
with respect to all standard ASN.1 encoding rules. This means that two
different implementations of X.694 will produce exactly the same binary
encodings, thus enabling interworking without a need to distribute the ASN.1
generated by the translation. In other words, the translation can be
independently performed at each node and the nodes will interoperate in all
cases.
The end result is that binary data, equivalent to a well-formed *and* valid
instance (according to a source schema) can be exchanged, having an XML
Schema available at both sides. This is one possible solution of the
"binary infoset" problem.
Alessandro Triglia
OSS Nokalva
On Wed, 2003-08-20 at 01:31, Liam Quin wrote:
On Sun, Aug 10, 2003 at 03:09:13PM -0400, Elliotte Rusty
Harold wrote:
One of the goals of some of the developers pushing binary
XML is to
speed up parsing, to provide some sort of preparsed
format that is
quicker to parse than real XML. I am extremely skeptical
that this
can be achieved in a platform-independent fashion.
That, in a nutshell, is why we're having a workshop.
The question is not, "can you get performance benefits or
significant
bandwidth savings using a more compact tramission method".
The question is, "Is there a trasmission/storage method that gives
significant benefits but that is also interoperable across
differing
platforms, architectures and implementations, and for a wide
cross-section of the entire XML industry."
Formats that hard-wire integer sizes in terms of octets,
for example,
obviusly have problems.
the ideas for writing length codes into the data might
help, though
I
doubt they help that much, or are robust in the face of data that
violates the length codes. Nonetheless this is at least
plausible.
In the past I've used a compressed encoding for integers,
rather like
UTF-8 -- e.g. set the top bit on each octet if more data follows in
the same number. That way low number (<= 127) use a single
byte, and
larger numbers use more as needed; sometimes compression can be
applied usefully to the result, too. This does require decoding,
which I suspect some people want to avoid, but I think it'll be
necessary. When I measured performance using this method
(not for XML
though) the I/O saving clearly beat out a text-based parse.
It would
depend on the data, I suspect -- as indeed you say.
I do not accept as an axiom that binary formats are
naturally faster
to parse than text formats.
Agreed - some are and some aren't.
Again, the primary goal of this W3C Workshop is to explore
whether we
(W3C) should be doing stndards work in this area,
especially given the
fact that a number of organisations are already transmitting binary
representations of XML Information Items, in a way that has zero
interoperability.
I hope this helps make at least my position clearer.
Furthermore, as
Robert Berjon has said, the proceedings will be made
public. We can't
force people to release data they used for benchmarks in their
position papers, but we can ask that for any future work, they use
public data, or create public data with properties similar to their
private data.
A benchmark is only useful if it can be reproduced.
Best,
Liam
-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org
<http://www.xml.org>, an initiative of OASIS
<http://www.oasis-open.org>
The list archives are at http://lists.xml.org/archives/xml-dev/
To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>
-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>
The list archives are at http://lists.xml.org/archives/xml-dev/
To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>
--
swilliams@hpti.com http://www.hpti.com Personal: sdw@lig.net http://sdw.st
Stephen D. Williams 43392 Wayside Cir,Ashburn,VA 20147-4622
703-724-0118W 703-995-0407Fax Oct2002
|