OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Re: Hostility to "binary XML"

[ Lists Home | Date Index | Thread Index ]

Richard Tobin wrote:


> I think that if you really wanted to, you could get 99% of this speed
> up anyway.  Don't check the characters, check the name against the
> DTD, and then only if it isn't declared check the characters and
> then fake a declaration so it will be quick next time.

I actually implemented something very much like this just this morning 
in XOM, only for namespace URIs rather than element and attribute names. 
I just store the four most recently seen namespace URIs in a cache, and 
search the cache before verifying that a string is a correct namespace 
URI. It basically dropped the time XOM spends verifying namespace URIs 
to zero.

I wonder if a similar scheme would help with verifying element names? 
Namespace names repeat a lot more commonly than element/attribute names, 
and there are fewer of them to search through. Still, in most documents 
names do repeat fairly frequently. Even if caching element names proved 
troublesome, attribute names are more commonly repeated, and namespace 
prefixes are very commonly repeated. You could cache these to avoid 
reverification.

I'm curious. Have any parser implementers built a dynamic cache of 
preverified names? Did it help any? Even if it in the general case it 
proves to be no faster than repeatedly checking the same names, it might 
still be useful to preload a cache of especially common names before 
parsing a lot of documents. For instance if you know you're going to be 
parsing SOAP, then you could load up all the common SOAP element names.

Another possible optimization: you don't need to verify end-tags, just 
check that it matches the start-tag, which you have to do anyway. I'm 
almost certain some, perhaps most or all, parsers are doing this already.

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS