xml-dev - Re: [xml-dev] Re: Hostility to "binary XML"

Re: [xml-dev] Re: Hostility to "binary XML"

[ Lists Home | Date Index | Thread Index ]

To: Richard Tobin <richard@inf.ed.ac.uk>
Subject: Re: [xml-dev] Re: Hostility to "binary XML"
From: Elliotte Harold <elharo@metalab.unc.edu>
Date: Tue, 23 Nov 2004 17:38:33 -0500
Cc: xml-dev@lists.xml.org
In-reply-to: <20041123221556.5A1A81AC7E3@macintosh.inf.ed.ac.uk>
References: <20041123221556.5A1A81AC7E3@macintosh.inf.ed.ac.uk>
User-agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.3) Gecko/20040910

Richard Tobin wrote:

> I think that if you really wanted to, you could get 99% of this speed
> up anyway.  Don't check the characters, check the name against the
> DTD, and then only if it isn't declared check the characters and
> then fake a declaration so it will be quick next time.

I actually implemented something very much like this just this morning 
in XOM, only for namespace URIs rather than element and attribute names. 
I just store the four most recently seen namespace URIs in a cache, and 
search the cache before verifying that a string is a correct namespace 
URI. It basically dropped the time XOM spends verifying namespace URIs 
to zero.

I wonder if a similar scheme would help with verifying element names? 
Namespace names repeat a lot more commonly than element/attribute names, 
and there are fewer of them to search through. Still, in most documents 
names do repeat fairly frequently. Even if caching element names proved 
troublesome, attribute names are more commonly repeated, and namespace 
prefixes are very commonly repeated. You could cache these to avoid 
reverification.

I'm curious. Have any parser implementers built a dynamic cache of 
preverified names? Did it help any? Even if it in the general case it 
proves to be no faster than repeatedly checking the same names, it might 
still be useful to preload a cache of especially common names before 
parsing a lot of documents. For instance if you know you're going to be 
parsing SOAP, then you could load up all the common SOAP element names.

Another possible optimization: you don't need to verify end-tags, just 
check that it matches the start-tag, which you have to do anyway. I'm 
almost certain some, perhaps most or all, parsers are doing this already.

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim

Follow-Ups:
- Re: [xml-dev] Re: Hostility to "binary XML"
  - From: Oleg Tkachenko <oleg@tkachenko.com>

References:
- Re: [xml-dev] Re: Hostility to "binary XML"
  - From: richard@inf.ed.ac.uk (Richard Tobin)

Prev by Date: Re: [xml-dev] Web Services/SOA (was RE: [xml-dev] XML 2004 weblogitems?)
Next by Date: RE: [xml-dev] Re: Hostility to "binary XML"
Previous by thread: Re: [xml-dev] Re: Hostility to "binary XML"
Next by thread: Re: [xml-dev] Re: Hostility to "binary XML"
Index(es):
- Date
- Thread