[
Lists Home |
Date Index |
Thread Index
]
On Thu, 27 Feb 2003 08:53:47 +0000
Alaric Snell wrote:
> On Wednesday 26 February 2003 09:52, Tahir Hashmi wrote:
>
> > # Tight coupling between schema revisions:
> >
> > XML is quite resilient to changes in the schema as long as the
> > changes are done smartly enough to allow old documents to pass
> > validation through the new schema. This flexibility would be
> > restricted the greater is the dependence of the binary encoding on
> > the schema.
>
> That's not a problem in practice, I think. Say we have a format that works by
> storing a dictionary of element and attribute names at the beginning of the
> document (or distributed through it, whenever the name is first encountered,
> or whatever) and that stores element and attribute text content as a compact
> binary representation of the type declared in the schema, including a few
> bits of type declaration in the header for each value.
That's alright, but a per-document data dictionary wouldn't be
suitable for a server dishing out large numbers of very small
documents due to the space overhead. Secondly, the encoder/decoder
will have to build a lookup table in memory for every document. A long
running application loses the opportunity to cache the lookup table in
some high-speed memory and has to go through the process of building
and tearing down lookup tables frequently. That's the reason why I
prefer data dictionaries per _document_type_ since often an instance
of application would deal with a limited set of document types.
> And in this scheme, the encoder is just using the schema as hints on what
> information it can discard for efficiency. If the schema says that
> something's an integer, it can drop all aspects of it apart from the integer
> value by encoding it is a binary number. But if the schema's constriction
> widens that integer field into an arbitrary string, then it can start
> encoding as arbitrary strings.
... and the decoder recognizes some fundamental data types which it
can read without referring to the schema - I like this approach :-)
> > With schema-based compaction done in all the aggressiveness
> > possible, how much would be gained against a simple markup
> > binarization scheme? Perhaps a compaction factor of, say, 5 over
> > XML. Would this be really significant when compared to a factor of,
> > say, 4 compaction achieved by markup binarization? This is an
> > optimization issue - the smaller the binary scheme, the more
> > computation required to extract information out of it. I'm not
> > totally against a type-aware encoding but for a standard binary
> > encoding to evolve, it would have to be in a "sweet spot" on the
> > size vs. computation vs. generality plane.
>
> Robin was quoting better numbers than these factors of 4 or 5... But even
> then, I think a bandwidth-limited company would be happy to do a relatively
> zero-cost upgrade away from textual XML in order to get a fivefold increase
> in capacity :-)
Exactly! That's what I want to emphasize. The numbers 4 and 5 are not
significant, what's significant is the difference between them. I'd
favour a slightly sub-optimal encoding that's (ideally) as flexible as
XML rather than one which becomes inflexible just to improve a little
more on what's already a significant improvement.
--
Tahir Hashmi (VSE, NCST)
http://staff.ncst.ernet.in/tahir
tahir AT ncst DOT ernet DOT in
We, the rest of humanity, wish GNU luck and Godspeed
|