xml-dev - Re: [xml-dev] JITTs and DOM

Re: [xml-dev] JITTs and DOM

[ Lists Home | Date Index | Thread Index ]

To: Patrick Durusau <pdurusau@emory.edu>
Subject: Re: [xml-dev] JITTs and DOM
From: Jeni Tennison <jeni@jenitennison.com>
Date: Thu, 10 Oct 2002 15:14:32 +0100
Cc: John Cowan <jcowan@reutershealth.com>, "W. E. Perry" <wperry@fiduciary.com>, LMNL-DEV <lmnl-dev@lmnl.org>, Matthew Brook O'Donnell <mtrout@nycap.rr.com>, <xml-dev@lists.xml.org>
In-reply-to: <3DA58333.7080001@emory.edu>
Organization: Jeni Tennison Consulting Ltd
References: <200210101305.JAA29379@mail2.reutershealth.com><19514849372.20021010140724@jenitennison.com> <3DA58333.7080001@emory.edu>
Reply-to: Jeni Tennison <jeni@jenitennison.com>

Hi Patrick,

> Thought there might be some interest in a very short note I have
> posted to the JITTs page on the impact of the JITTs paradigm on DOM
> performance.
>
> Jump from the JITTs page, http://www.sbl-site2.org/Extreme2002 or go
> there directly at:
> http://www.sbl-site2.org/Extreme2002/JITTs_and_DOM.html.
>
> Running on a laptop, gains of 26 to 39 times were observed. (As file
> sizes increases, so do the performance increases.) Note I do not
> characterize these as "tests," merely observations. Further work is
> needed.

(Very glad to see the well-formed XML example files :)

I'd be *very* careful about drawing any conclusions about speed up
from these observations. What you've done for these observations is
replace markup-significant characters (e.g. '<') with
markup-insignificant characters (i.e. '@'), effectively turning whole
regions of the document into plain text.

I'm certainly no parser writer, but having hacked around the internals
of Aelfred in order to create a parser for LMNL [1], I can see why
this would cause a speed-up -- all a parser has to do to process these
sections is scan through the text until it comes to a '<'; that
scanning process is very fast.

My understanding of JITTs is that you'd need a lot more sophistication
in the parser. It wouldn't be enough to just ignore all the tags that
the parser came across (which is what you've done in effect). Instead,
the parser would have to read the tag, look at the name of the tag,
check that against a list (from a DTD or schema) in order to work out
what to do, and then either generate a "start/endElement" event or
generate a "characters" event (to report the tag as a string)
depending on the tag's status. If anything, I imagine that this will
*add* time to the parsing of the document.

[I guess an alternative approach would be to run the character stream
through a pre-processor that changed particular '<' characters into
'@' characters, though unless you could be sure that *all* elements
called "foo" could be effectively ignored (i.e. that you didn't have
any local declarations of elements, whereby some should be ignored and
some taken into account), it's probably worthwhile taking the approach
above.]

On the positive side, filtering a document will lead to fewer events
being generated and fewer objects being created in the DOM, which
should make the DOM smaller and should make the process *after*
parsing that much quicker.

In other words, it's certainly the case that JITTs has promise in
terms of reducing DOM size and speeding up processing, but I don't
think that these observations are really an accurate representation of
what the effect will be. I don't know what you're planning for your
next step, but a good demonstration, in my opinion, would be to show
how the same XSLT transformation, say, is speeded up by working on a
JITTs DOM compared to a normal DOM.

Cheers,

Jeni

[1] http://www.lmnl.org/projects/LMNOP

---
Jeni Tennison
http://www.jenitennison.com/

Follow-Ups:
- Re: [xml-dev] JITTs and DOM
  - From: Patrick Durusau <pdurusau@emory.edu>
- Re: [xml-dev] JITTs and DOM
  - From: "Rick Jelliffe" <ricko@allette.com.au>

References:
- JITTs and DOM
  - From: Patrick Durusau <pdurusau@emory.edu>

Prev by Date: RE: RE: [xml-dev] Great piece on RSS
Next by Date: overhead
Previous by thread: Re: [xml-dev] JITTs and DOM
Next by thread: Re: [xml-dev] JITTs and DOM
Index(es):
- Date
- Thread