Lists Home |
Date Index |
Essentially we have a large project that reuses (and reuses) various XML
fragments from many different sources in many different combinations
many times (controlled by some small number of parameters). Think of
this as a cache of parsed XML that can subsequently be consumed by XSLT
transforms. On occasion some portions of this cache will be invalidated
This sounds very much like a combination of several emerging standards,
the first being UN/CEFACT Core Components . If you're interested in
learning more, please also reference a recent presentation I gave to a
federal working group on "Core Components and ebXML Registry" , which
also discusses the incorporation of Core Components into the ebXML
Registry architecture (a process which I am heading up - see slides
The second standard is the OASIS Content Assembly Mechanism (CAM)
specification , described on slide 41 of the presentation.
Booz | Allen | Hamilton
"Hunsberger, Peter" wrote:
> On Thursday, July 31, 2003 8:58 PM Tyler Close <firstname.lastname@example.org>
> > On Thursday 31 July 2003 19:36, Mike Champion wrote:
> > > On Thu, 31 Jul 2003 17:46:32 -0400, Tyler Close
> > <email@example.com>
> > > wrote:
> > > > For an example of a binary format that supports efficient string
> > > > interning, without a penalty to generality, see:
> > > >
> > > > http://www.waterken.com/dev/Doc/code/
> > >
> > > Very interesting point/idea.
> > Thanks.
> > > AFAIK much of the overhead of XML text
> > > parsing that the binary infoset advocates complain about is in the
> > > Unicode encoding/decoding and raw string processing (e.g,
> > looking at
> > > every character to see where an element ends rather than having a
> > > stored length).
> > The Waterken(TM) Doc code format uses a chunked
> > representation for encoding a string. This provides the
> > speed benefits of a length prefix without creating an
> > unlimited buffering requirement.
> > > Likewise, a number of alternative infoset serializations use the
> > > "stream of SAX events" metaphor, that sounds a bit like what that
> > > document describes.
> > Same basic idea.
> > > But that doesn't sound like "string interning" to me (and
> > "interning"
> > > is not mentioned in that document).
> > Notice that all the meta data (ie: the string identifiers)
> > are stored in a set of string registers. Subsequent uses of a
> > string specify the index of the string to use. This results
> > in each string identifier being instantiated just once. The
> > singleton instance is the interned instance.
> > > I thought "interning" was more of a
> > > technique for keeping compiled code small by referencing redundant
> > > strings via their hash values.
> > It's more to do with fast lookup than memory savings. The
> > hash only gets computed once and equality checks are just
> > pointer comparisons. Same thinking is at work in the Doc code format.
> One of my vague long term projects is to look at ways of building and
> utilizing a sort of PSVI database. (Binary XML that never leaves the
> building...) Essentially we have a large project that reuses (and
> reuses) various XML fragments from many different sources in many
> different combinations many times (controlled by some small number of
> parameters). Think of this as a cache of parsed XML that can
> subsequently be consumed by XSLT transforms. On occasion some portions
> of this cache will be invalidated and replaced.
> So the question becomes; do you think any of this work could form a
> basis for such a database? Would it be efficient to parse XML to this
> format, then feed (multiple chained) XSLT transforms from this format?
> I'd spend some time examining the code, but we're in the middle of a
> release and more than swamped at the moment... (For the Cocoon-dev
> lurkers on this list, yes, this is related to the discussion on long
> term caching models.)
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
org:Booz | Allen | Hamilton;IT Digital Strategies Team
adr:;;8283 Greensboro Drive;McLean;VA;22012;
fn:Joseph M. Chiusano