OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] JITTs and DOM

[ Lists Home | Date Index | Thread Index ]

Hi Patrick,

> Thought there might be some interest in a very short note I have
> posted to the JITTs page on the impact of the JITTs paradigm on DOM
> performance.
> Jump from the JITTs page, http://www.sbl-site2.org/Extreme2002 or go
> there directly at:
> http://www.sbl-site2.org/Extreme2002/JITTs_and_DOM.html.
> Running on a laptop, gains of 26 to 39 times were observed. (As file
> sizes increases, so do the performance increases.) Note I do not
> characterize these as "tests," merely observations. Further work is
> needed.

(Very glad to see the well-formed XML example files :)

I'd be *very* careful about drawing any conclusions about speed up
from these observations. What you've done for these observations is
replace markup-significant characters (e.g. '<') with
markup-insignificant characters (i.e. '@'), effectively turning whole
regions of the document into plain text.

I'm certainly no parser writer, but having hacked around the internals
of Aelfred in order to create a parser for LMNL [1], I can see why
this would cause a speed-up -- all a parser has to do to process these
sections is scan through the text until it comes to a '<'; that
scanning process is very fast.

My understanding of JITTs is that you'd need a lot more sophistication
in the parser. It wouldn't be enough to just ignore all the tags that
the parser came across (which is what you've done in effect). Instead,
the parser would have to read the tag, look at the name of the tag,
check that against a list (from a DTD or schema) in order to work out
what to do, and then either generate a "start/endElement" event or
generate a "characters" event (to report the tag as a string)
depending on the tag's status. If anything, I imagine that this will
*add* time to the parsing of the document.

[I guess an alternative approach would be to run the character stream
through a pre-processor that changed particular '<' characters into
'@' characters, though unless you could be sure that *all* elements
called "foo" could be effectively ignored (i.e. that you didn't have
any local declarations of elements, whereby some should be ignored and
some taken into account), it's probably worthwhile taking the approach

On the positive side, filtering a document will lead to fewer events
being generated and fewer objects being created in the DOM, which
should make the DOM smaller and should make the process *after*
parsing that much quicker.

In other words, it's certainly the case that JITTs has promise in
terms of reducing DOM size and speeding up processing, but I don't
think that these observations are really an accurate representation of
what the effect will be. I don't know what you're planning for your
next step, but a good demonstration, in my opinion, would be to show
how the same XSLT transformation, say, is speeded up by working on a
JITTs DOM compared to a normal DOM.



[1] http://www.lmnl.org/projects/LMNOP

Jeni Tennison

  • References:


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS