Lists Home |
Date Index |
On Wednesday 26 February 2003 09:52, Tahir Hashmi wrote:
> # Interpreting involved binary constructs could be more difficult:
> Consider the variable length symbols that I have used in Xqueeze
> (as also Dennis Sosnoski in XMLS, IIRC). The symbols are easy to
> understand - unsigned integers serialized as octets in Big-endian
> order, with the least significant bit of each octet acting as a
> continuation flag. However, parsing them requires a loop that runs
> as many times as there are octets in the symbol to read one. Each
> iteration involves one comparison (check if LSb is 1),
> multiplication (promotion of the previous octet by 8 bits) and
> addition (value of the current octet). It's not difficult to see the
> computation involved in arriving at "Wed Jan 3rd 2003, 14:00 GMT"
> from a variable length integer that counts the number of seconds
> since the Epoch.
I'm not sure what you're trying to say here. Reading the variable length
integer from a file would be more efficient than reading the date string and
converting that to a number of seconds since the epoch, yes?
> # Tight coupling between schema revisions:
> XML is quite resilient to changes in the schema as long as the
> changes are done smartly enough to allow old documents to pass
> validation through the new schema. This flexibility would be
> restricted the greater is the dependence of the binary encoding on
> the schema.
That's not a problem in practice, I think. Say we have a format that works by
storing a dictionary of element and attribute names at the beginning of the
document (or distributed through it, whenever the name is first encountered,
or whatever) and that stores element and attribute text content as a compact
binary representation of the type declared in the schema, including a few
bits of type declaration in the header for each value.
There is enough information in the binary file to recreate the original XML
document, modulo the PSVI-canonicalisation of 1.<!--hello-->2 becoming 1.2
and so on, so the binary reader will be unaffected by any schema changes by
definition; it doesn't need the schema to decode.
And in this scheme, the encoder is just using the schema as hints on what
information it can discard for efficiency. If the schema says that
something's an integer, it can drop all aspects of it apart from the integer
value by encoding it is a binary number. But if the schema's constriction
widens that integer field into an arbitrary string, then it can start
encoding as arbitrary strings.
And when using ASN.1 encodings, they support extensibility in ways XML
doesn't! When you extend an ASN.1 type, you use an extension marker to denote
where the extension happened, and the encodings use this information to work
in a way that means that older versions of the type will still decode
successfully in higher-version readers, and that lower-version readers can
still read the parts of higher-version specs that they know about, and in
both cases the application using the decoder can opt to be warned about the
version mismatches if it cares. But either way it's not a fatal problem if
the ASN.1 types change; it's just reported to the applications.
> With schema-based compaction done in all the aggressiveness
> possible, how much would be gained against a simple markup
> binarization scheme? Perhaps a compaction factor of, say, 5 over
> XML. Would this be really significant when compared to a factor of,
> say, 4 compaction achieved by markup binarization? This is an
> optimization issue - the smaller the binary scheme, the more
> computation required to extract information out of it. I'm not
> totally against a type-aware encoding but for a standard binary
> encoding to evolve, it would have to be in a "sweet spot" on the
> size vs. computation vs. generality plane.
Robin was quoting better numbers than these factors of 4 or 5... But even
then, I think a bandwidth-limited company would be happy to do a relatively
zero-cost upgrade away from textual XML in order to get a fivefold increase
in capacity :-)
Oh, pilot of the storm who leaves no trace, Like thoughts inside a dream
Heed the path that led me to that place, Yellow desert screen