xml-dev - Re: [xml-dev] Parsing efficiency?

Re: [xml-dev] Parsing efficiency? - why not 'compile'????

[ Lists Home | Date Index | Thread Index ]

To: Tahir Hashmi <code_martial@softhome.net>
Subject: Re: [xml-dev] Parsing efficiency? - why not 'compile'????
From: Robin Berjon <robin.berjon@expway.fr>
Date: Fri, 28 Feb 2003 18:30:51 +0100
Cc: xml-dev@lists.xml.org
In-reply-to: <20030228220634.40936aca.code_martial@softhome.net>
Organization: Expway
References: <OF047F14BF.0A3C2919-ONCA256CD8.00031C17@facs.gov.au> <E18nbOK-0004gi-00@calvin.frontwire.com> <007301c2dcdc$1933dba0$9e539696@citkwaclaww2k> <3E5B84CF.4020703@expway.fr> <20030226152200.4c0b681b.code_martial@softhome.net> <3E5DE241.3050703@expway.fr> <20030228220634.40936aca.code_martial@softhome.net>
Reply-to: robin.berjon@expway.fr
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.2) Gecko/20021126

Tahir Hashmi wrote:
> Robin Berjon wrote:
> In the first group, there could be a subgroup that doesn't need binary
> markup but may use it simply because it can, without affecting the way
> its applications work. That's the group that doesn't need human
> read/write-ability for its XML docs - the group of WYSIWYG Office
> suites, XML-based instant messaging protocols and so on.

I would quite seriously oppose using binary infosets when you don't need them. 
It adds to the complexity of the system and removes a variety of features of 
XML. Office suites can (and in fact do) use zip (if only because it doubles as a 
packaging format with is very convenient for attached files such as images). XML 
IM either needs binary infosets for performance reasons, or doesn't and 
shouldn't use it.

> Consider this: the application is only interested in strings for date
> but the schema designer specified a date type because it is the Right
> Thing(TM) for a date (so that the schema need not be changed if at some
> point of time the same application or another application does get
> interested in the value).
> 
> In a binary representation, the processor will decode the variable
> length binary value to arrive at the number of seconds since epoch,
> then re-construct a string for the application. Note that the
> processor will be *synthesizing* a string that could be read straight
> off the document.
> 
> This approach would be better only if the benefits of saved bandwidth
> are greater than the cost of synthesizing the date string. And we
> can't assume that limited bandwidth is *always* going to be the
> motivating factor for using binary markup.

That's why in BinXML you can specify how you encode your data. In the case you 
cite one would simply ask that the xs:fooDate type use the UTF-8 codec.

> The particular example I gave is illustrative only and as stated
> earlier, I'm not against type-awareness. I'm simply being wary of how
> much flexibility might possibily be lost, and in some cases
> computation be wasted, in the quest of a super-optimized binary
> encoding.

Again, if you don't want something encoded just ask the application to not touch 
it :)

>>As for your remark on the speed of decompaction, note that you may be right for 
>>a naive implementation of the same thing but there's compsci literature out 
>>there on making such tasks fast.
> 
> Well yes, naivete may lead to bad design. The point is that more the
> logic that goes into decoding a format, the higher the bar for small
> devices is raised. While one can have small non-validating SAX parsers
> for XML, the size of a binary format parser may go up since it would
> have to know about synthesizing dates from integers, deducing document
> structure from the schema etc, besides the indispensible passing of
> strings around. The encoding scheme should require least possible
> context information and minimal parsing logic to be accessible
> there. Hope I'm able to explain myself better this time!

It all depends on what you need. I totally agree that there is no 
one-size-fits-all but I do believe that it is very much possible to produce a 
flexible format that can be configured in a variety of ways, without it loosing 
internal coherence. If you want a tiny and ultra fast decoder you can drop 
support for encoding of the more complex types, if you want a slightly larger 
decoder but the smallest possible payload you add codecs to encode the content 
optimally.

-- 
Robin Berjon <robin.berjon@expway.fr>
Research Engineer, Expway        http://expway.fr/
7FC0 6F5F D864 EFB8 08CE  8E74 58E6 D5DB 4889 2488

Follow-Ups:
- Re: [xml-dev] Parsing efficiency? - why not 'compile'????
  - From: John Cowan <jcowan@reutershealth.com>

References:
- Parsing efficiency? - why not 'compile'????
  - From: Matthew.Bennett@facs.gov.au
- Re: [xml-dev] Parsing efficiency? - why not 'compile'????
  - From: "Alaric B. Snell" <alaric@alaric-snell.com>
- Re: [xml-dev] Parsing efficiency? - why not 'compile'????
  - From: "Karl Waclawek" <karl@waclawek.net>
- Re: [xml-dev] Parsing efficiency? - why not 'compile'????
  - From: Robin Berjon <robin.berjon@expway.fr>
- Re: [xml-dev] Parsing efficiency? - why not 'compile'????
  - From: Tahir Hashmi <code_martial@softhome.net>
- Re: [xml-dev] Parsing efficiency? - why not 'compile'????
  - From: Robin Berjon <robin.berjon@expway.fr>
- Re: [xml-dev] Parsing efficiency? - why not 'compile'????
  - From: Tahir Hashmi <code_martial@softhome.net>

Prev by Date: RE: [xml-dev] The subsetting has begun
Next by Date: Re: [xml-dev] The subsetting has begun
Previous by thread: Re: [xml-dev] Parsing efficiency? - why not 'compile'????
Next by thread: Re: [xml-dev] Parsing efficiency? - why not 'compile'????
Index(es):
- Date
- Thread