OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Parsing efficiency? - why not 'compile'????

[ Lists Home | Date Index | Thread Index ]

Tahir Hashmi wrote:
> Robin Berjon wrote:
> In the first group, there could be a subgroup that doesn't need binary
> markup but may use it simply because it can, without affecting the way
> its applications work. That's the group that doesn't need human
> read/write-ability for its XML docs - the group of WYSIWYG Office
> suites, XML-based instant messaging protocols and so on.

I would quite seriously oppose using binary infosets when you don't need them. 
It adds to the complexity of the system and removes a variety of features of 
XML. Office suites can (and in fact do) use zip (if only because it doubles as a 
packaging format with is very convenient for attached files such as images). XML 
IM either needs binary infosets for performance reasons, or doesn't and 
shouldn't use it.

> Consider this: the application is only interested in strings for date
> but the schema designer specified a date type because it is the Right
> Thing(TM) for a date (so that the schema need not be changed if at some
> point of time the same application or another application does get
> interested in the value).
> In a binary representation, the processor will decode the variable
> length binary value to arrive at the number of seconds since epoch,
> then re-construct a string for the application. Note that the
> processor will be *synthesizing* a string that could be read straight
> off the document.
> This approach would be better only if the benefits of saved bandwidth
> are greater than the cost of synthesizing the date string. And we
> can't assume that limited bandwidth is *always* going to be the
> motivating factor for using binary markup.

That's why in BinXML you can specify how you encode your data. In the case you 
cite one would simply ask that the xs:fooDate type use the UTF-8 codec.

> The particular example I gave is illustrative only and as stated
> earlier, I'm not against type-awareness. I'm simply being wary of how
> much flexibility might possibily be lost, and in some cases
> computation be wasted, in the quest of a super-optimized binary
> encoding.

Again, if you don't want something encoded just ask the application to not touch 
it :)

>>As for your remark on the speed of decompaction, note that you may be right for 
>>a naive implementation of the same thing but there's compsci literature out 
>>there on making such tasks fast.
> Well yes, naivete may lead to bad design. The point is that more the
> logic that goes into decoding a format, the higher the bar for small
> devices is raised. While one can have small non-validating SAX parsers
> for XML, the size of a binary format parser may go up since it would
> have to know about synthesizing dates from integers, deducing document
> structure from the schema etc, besides the indispensible passing of
> strings around. The encoding scheme should require least possible
> context information and minimal parsing logic to be accessible
> there. Hope I'm able to explain myself better this time!

It all depends on what you need. I totally agree that there is no 
one-size-fits-all but I do believe that it is very much possible to produce a 
flexible format that can be configured in a variety of ways, without it loosing 
internal coherence. If you want a tiny and ultra fast decoder you can drop 
support for encoding of the more complex types, if you want a slightly larger 
decoder but the smallest possible payload you add codecs to encode the content 

Robin Berjon <robin.berjon@expway.fr>
Research Engineer, Expway        http://expway.fr/
7FC0 6F5F D864 EFB8 08CE  8E74 58E6 D5DB 4889 2488


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS