OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] Faster processing without schemas? (was Re: [xml-dev] Micr

[ Lists Home | Date Index | Thread Index ]

Michael Champion wrote:
> Generic DBMS and middleware (ahem, the payers of my salary)
> can't in  general efficiently know the schema of everything
> flowing in and out, so requiring schema knowledge is a 
> showstopper for me.
	As I understand your position, you are willing to accept more
than one encoding if:
      1. there are a small number of widely supported 
         serialization standards 
      2. XML text is mandated as the fallback in content negotiation
      3. a priori schema knowledge is not required.
	The ASN.1 defined binary encodings do not conflict with the
first two requirements. The "issue" with ASN.1 defined encodings would
be around the question of schema knowledge (i.e. item 3). As is well
known, ASN.1 based systems typically do require that both sides of a
link share knowledge of a common schema. 
	However, this is more an attribute of the way that ASN.1 is
used rather than the system itself. In the past, ASN.1 has usually
been used in situations where shared knowledge of schemas was not only
considered reasonable but often was considered desirable... However,
one can easily produce a single ASN.1 schema that is capable of
encoding any XML data in such a way that the original XML can be
reconstructed without reference to any other schema. 
	In other words, one can easily use ASN.1 to define an
equivalent of the encoding discussed in Dennis M. Sosnoski's
presentation to the Binary XML Workshop. Sosnoski's XBIS appears to be
a serialization of a SAX2 event stream coupled with a symbol table
that allows compression of strings used more than once. (i.e. strings
are replaced by compact "handles" which are indexes into the symbol
table.) The same can be described quite easily in ASN.1. In fact, I
believe that an ASN.1 based encoding would have additional benefits in
the case where the encoder (but *not* necessarily the decoder) had
access to a user generated schema since the ASN.1 encoder would then
be able to replace many text nodes with integer or other binary
representations that are more compact than text. Such compression of
text by substituting binary equivalents is not supported in Sosnoski's
	The method of providing a symbol table or "directory" within
an encoding in order to achieve compression is something that has been
done in the past with ASN.1 schemas. For instance, I remember a word
processor at Digital that had very large encodings due to the fact
that "rulers" and other similar large structural elements needed to be
referenced frequently within a file. Rather than restating these large
objects whenever they were referred to, the solution was to list the
"rulers" in a "table" and just refer to them by their id's in the
actual document. This is conceptually exactly what is done in XBIS and
other similar encodings. Nothing new.
	Some may object to the fact that there would still be a
requirement for one schema to be know by all readers and writers of
the "no-schema" encoding. However, I hope you can see that such a
schema, whether explicit or implicit, is required by any encoding
system. Even "no-schema" text XML has an implicit schema that defines
what is an element, what is an attribute, etc...
	Hopefully, you'll accept that ASN.1 can be just as useful in
the "no-schema" case as it is in the "schema-aware" case. Given that
we already have available to us a standardized, mature, widely used
method of binary encoding, I personally can't see the justification
for pursuing the definition of a new binary encoding. What we should
have is:

	No-Schema Encodings:
		Text:	XML
		binary: ASN.1 ?ER with schema for XML
	Schema-Aware Encodings:
		Text: XML + custom schema
		binary: ASN.1 ?ER + custom schema

	i.e. four use cases with two encoding solutions

	The interesting discussion should be over what is the best way
to define the schema for the "no-schema" case. Should it be a simple
serialization of a SAX2 event stream? If so, would the "symbol
definitions" be done in-line to minimize the memory requirements
during one-pass reading? Or, would they be gathered into a table at
the top or bottom of the data? Should all data be passed as text? Or,
if a schema is available, should the encoder be permitted to
substitute primitive types like INTEGER when they are called for in
the schema? (Assuming that the decoder would output them as strings.)
	We don't need another binary encoding, at most, what we need
is agreement on what the ASN.1 schema for a "no-schema" binary
encoding would look like.

		bob wyman


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS