xml-dev - Re: [xml-dev] Parsing efficiency?

Re: [xml-dev] Parsing efficiency? - why not 'compile'????

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Parsing efficiency? - why not 'compile'????
From: Tahir Hashmi <code_martial@softhome.net>
Date: Fri, 28 Feb 2003 11:13:51 +0530
In-reply-to: <20030227073457.2C45F5542@calm.warhead.org.uk>
References: <OF047F14BF.0A3C2919-ONCA256CD8.00031C17@facs.gov.au><3E5B84CF.4020703@expway.fr><20030226152200.4c0b681b.code_martial@softhome.net><20030227073457.2C45F5542@calm.warhead.org.uk>

On Thu, 27 Feb 2003 08:53:47 +0000
Alaric Snell wrote:

> On Wednesday 26 February 2003 09:52, Tahir Hashmi wrote:
> 
> > # Tight coupling between schema revisions:
> >
> >   XML is quite resilient to changes in the schema as long as the
> >   changes are done smartly enough to allow old documents to pass
> >   validation through the new schema. This flexibility would be
> >   restricted the greater is the dependence of the binary encoding on
> >   the schema.
> 
> That's not a problem in practice, I think. Say we have a format that works by 
> storing a dictionary of element and attribute names at the beginning of the 
> document (or distributed through it, whenever the name is first encountered, 
> or whatever) and that stores element and attribute text content as a compact 
> binary representation of the type declared in the schema, including a few 
> bits of type declaration in the header for each value.

That's alright, but a per-document data dictionary wouldn't be
suitable for a server dishing out large numbers of very small
documents due to the space overhead. Secondly, the encoder/decoder
will have to build a lookup table in memory for every document. A long
running application loses the opportunity to cache the lookup table in
some high-speed memory and has to go through the process of building
and tearing down lookup tables frequently. That's the reason why I
prefer data dictionaries per _document_type_ since often an instance
of application would deal with a limited set of document types.

> And in this scheme, the encoder is just using the schema as hints on what 
> information it can discard for efficiency. If the schema says that 
> something's an integer, it can drop all aspects of it apart from the integer 
> value by encoding it is a binary number. But if the schema's constriction 
> widens that integer field into an arbitrary string, then it can start 
> encoding as arbitrary strings.

... and the decoder recognizes some fundamental data types which it
can read without referring to the schema - I like this approach :-)

> >   With schema-based compaction done in all the aggressiveness
> >   possible, how much would be gained against a simple markup
> >   binarization scheme? Perhaps a compaction factor of, say, 5 over
> >   XML. Would this be really significant when compared to a factor of,
> >   say, 4 compaction achieved by markup binarization? This is an
> >   optimization issue - the smaller the binary scheme, the more
> >   computation required to extract information out of it. I'm not
> >   totally against a type-aware encoding but for a standard binary
> >   encoding to evolve, it would have to be in a "sweet spot" on the
> >   size vs. computation vs. generality plane.
> 
> Robin was quoting better numbers than these factors of 4 or 5... But even 
> then, I think a bandwidth-limited company would be happy to do a relatively 
> zero-cost upgrade away from textual XML in order to get a fivefold increase 
> in capacity :-)

Exactly! That's what I want to emphasize. The numbers 4 and 5 are not
significant, what's significant is the difference between them. I'd
favour a slightly sub-optimal encoding that's (ideally) as flexible as
XML rather than one which becomes inflexible just to improve a little
more on what's already a significant improvement.

--
Tahir Hashmi (VSE, NCST)
http://staff.ncst.ernet.in/tahir
tahir AT ncst DOT ernet DOT in

We, the rest of humanity, wish GNU luck and Godspeed

References:
- Parsing efficiency? - why not 'compile'????
  - From: Matthew.Bennett@facs.gov.au
- Re: [xml-dev] Parsing efficiency? - why not 'compile'????
  - From: Robin Berjon <robin.berjon@expway.fr>
- Re: [xml-dev] Parsing efficiency? - why not 'compile'????
  - From: Tahir Hashmi <code_martial@softhome.net>
- Re: [xml-dev] Parsing efficiency? - why not 'compile'????
  - From: Alaric Snell <alaric@alaric-snell.com>

Prev by Date: RE: [xml-dev] The subsetting has begun
Next by Date: Re: [xml-dev] Parsing efficiency? - why not 'compile'????
Previous by thread: Re: [xml-dev] Sorting out what we agree and disagree on (was Re:[xml-dev] Parsing efficiency? - why not 'compile'????)
Next by thread: Re: [xml-dev] Parsing efficiency? - why not 'compile'????
Index(es):
- Date
- Thread