[
Lists Home |
Date Index |
Thread Index
]
Richard Tobin wrote:
> I think that if you really wanted to, you could get 99% of this speed
> up anyway. Don't check the characters, check the name against the
> DTD, and then only if it isn't declared check the characters and
> then fake a declaration so it will be quick next time.
I actually implemented something very much like this just this morning
in XOM, only for namespace URIs rather than element and attribute names.
I just store the four most recently seen namespace URIs in a cache, and
search the cache before verifying that a string is a correct namespace
URI. It basically dropped the time XOM spends verifying namespace URIs
to zero.
I wonder if a similar scheme would help with verifying element names?
Namespace names repeat a lot more commonly than element/attribute names,
and there are fewer of them to search through. Still, in most documents
names do repeat fairly frequently. Even if caching element names proved
troublesome, attribute names are more commonly repeated, and namespace
prefixes are very commonly repeated. You could cache these to avoid
reverification.
I'm curious. Have any parser implementers built a dynamic cache of
preverified names? Did it help any? Even if it in the general case it
proves to be no faster than repeatedly checking the same names, it might
still be useful to preload a cache of especially common names before
parsing a lot of documents. For instance if you know you're going to be
parsing SOAP, then you could load up all the common SOAP element names.
Another possible optimization: you don't need to verify end-tags, just
check that it matches the start-tag, which you have to do anyway. I'm
almost certain some, perhaps most or all, parsers are doing this already.
--
Elliotte Rusty Harold elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
|