[
Lists Home |
Date Index |
Thread Index
]
Tahir Hashmi wrote:
> Robin Berjon wrote:
>>It would be horrible. Quite simply horrible. But then, it would never have taken
>>off so we wouldn't be discussing it.
>
> Let me modify Karl's assumption a little:
>
> Let's assume we /now have/ a binary XML specification [snip],
> everything basically the same, just binary streaming format, but
> same Infoset, same APIs /as/ for reporting XML content.
>
> And again ask these questions:
>
> What would be the difference? For the programmer? For the platforms?
(note that your question is a bit flawed as we already have standard
specifications for binary infosets.)
You basically have two groups of people:
- those that don't need it. For them, it'll make no difference. They wouldn't
use it. This is not the WXS type of technology that dribbles its way through
many others.
- those that do need it. These folks will be able to use XML where they
couldn't before. And when I say XML, I mean AngleBracketedUnicode. Conversion to
binary will only happen in the steps where it is needed so that most of what
those people will see will be actual XML.
> Extreme optimization based on the knowledge of Schema might be
> unattractive because:
>
> # Interpreting involved binary constructs could be more difficult:
>
> Consider the variable length symbols that I have used in Xqueeze[1]
> (as also Dennis Sosnoski in XMLS, IIRC). The symbols are easy to
> understand - unsigned integers serialized as octets in Big-endian
> order, with the least significant bit of each octet acting as a
> continuation flag. However, parsing them requires a loop that runs
> as many times as there are octets in the symbol to read one. Each
> iteration involves one comparison (check if LSb is 1),
> multiplication (promotion of the previous octet by 8 bits) and
> addition (value of the current octet). It's not difficult to see the
> computation involved in arriving at "Wed Jan 3rd 2003, 14:00 GMT"
> from a variable length integer that counts the number of seconds
> since the Epoch[2].
Errr... I really am not sure what you mean, notably by "involved binary
constructs". I think you can distinguish between two situations: a) the
application wants a date, in which case seconds since the Epoch or a time_t
struct might be exactly what it wants, it'll be cheaper than strptime(3) for
sure; b) the application wants a string containing a date in which case you're
free to store dates as strings in your binary format.
> # Forced validation:
>
> The above situation would be even more ironic if the application
> didn't care about the actual value of the date and was only
> interested in some string that looked like a date. With XML
> validation of data types is an option that is being enforced as a
> requirement in the above scheme. Even where validation is required,
> how far can a parser validate? A value may be syntactically or
> semantically acceptable but contextually invalid (lame e.g. - a date
> of birth being in the future). My point: validation is and should
> remain an option.
This is completely orthogonal to the subject.
> # Tight coupling between schema revisions:
>
> XML is quite resilient to changes in the schema as long as the
> changes are done smartly enough to allow old documents to pass
> validation through the new schema. This flexibility would be
> restricted the greater is the dependence of the binary encoding on
> the schema. (I still have to reach XML's level of compatibility in
> Xqueeze Associations (data dictionary). Fortunately, achieving that
> wouldn't require changes in the grammar of the encoding).
This is a solved problem in BinXML, multiple versions of the same schema can
co-exist.
> # What is gained in the end?
>
> With schema-based compaction done in all the aggressiveness
> possible, how much would be gained against a simple markup
> binarization scheme? Perhaps a compaction factor of, say, 5 over
> XML. Would this be really significant when compared to a factor of,
> say, 4 compaction achieved by markup binarization? This is an
> optimization issue - the smaller the binary scheme, the more
> computation required to extract information out of it. I'm not
> totally against a type-aware encoding but for a standard binary
> encoding to evolve, it would have to be in a "sweet spot" on the
> size vs. computation vs. generality plane.
I'm all for finding a sweet spot but pulling random numbers out of a hat and
making broad assumptions about size vs computation won't contribute much in
getting there. I am talking about empirically proven, tested, retested, put to
work in a wide variety of situations, factors of 10, 20 or 50 (or more, but
testing on SOAP is cheating ;).
As for your remark on the speed of decompaction, note that you may be right for
a naive implementation of the same thing but there's compsci literature out
there on making such tasks fast.
--
Robin Berjon <robin.berjon@expway.fr>
Research Engineer, Expway http://expway.fr/
7FC0 6F5F D864 EFB8 08CE 8E74 58E6 D5DB 4889 2488
|