xml-dev - String interning (Was: [xml-dev] Binary XML == "spawn of the devil" ?)

String interning (Was: [xml-dev] Binary XML == "spawn of the devil" ?)

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: String interning (Was: [xml-dev] Binary XML == "spawn of the devil" ?)
From: Tyler Close <tyler@waterken.com>
Date: Thu, 31 Jul 2003 17:46:32 -0400
In-reply-to: <3F268224.7090801@expway.fr>
References: <15725CF6AFE2F34DB8A5B4770B7334EE022DC6D0@hq1.pcmail.ingr.com> <3F268224.7090801@expway.fr>

As other people have already remarked, performance comparisons
between a binary and textual format should not be based on message
size alone.

In some applications the actual operation to be performed is very
simple and fast. In this case, the time required to extract the
input information from the input document dominates. A binary
format can reduce the amount of time necessary to extract the
input information.

A binary format can efficiently produce a data model in which all
identifiers are interned. This optimization speeds lookup
operations as it is much faster to compare pointers than text
strings.

For an example of a binary format that supports efficient string
interning, without a penalty to generality, see:

http://www.waterken.com/dev/Doc/code/

For one application, the E project <http://www.erights.org/>, this
optimization was a primary reason in choosing Waterken Doc code
over competing formats.

I think this technique could be valuable in an XML binary syntax.
At the very least, it's worth considering the potential
performance gains.

Tyler

Follow-Ups:
- Re: [xml-dev] String interning (Was: [xml-dev] Binary XML == "spawn of the devil" ?)
  - From: Mike Champion <mc@xegesis.org>

References:
- RE: [xml-dev] Binary XML == "spawn of the devil" ?
  - From: "Bullard, Claude L (Len)" <clbullar@ingr.com>
- Re: [xml-dev] Binary XML == "spawn of the devil" ?
  - From: Robin Berjon <robin.berjon@expway.fr>

Prev by Date: Re: [xml-dev] Extract A Subset of a W3C XML Schema?
Next by Date: Re: [xml-dev] Extract A Subset of a W3C XML Schema?
Previous by thread: Re: [xml-dev] Binary XML == "spawn of the devil" ?
Next by thread: Re: [xml-dev] String interning (Was: [xml-dev] Binary XML == "spawn of the devil" ?)
Index(es):
- Date
- Thread