xml-dev - Re: [xml-dev] JITTs and DOM

Re: [xml-dev] JITTs and DOM

[ Lists Home | Date Index | Thread Index ]

To: Jeni Tennison <jeni@jenitennison.com>
Subject: Re: [xml-dev] JITTs and DOM
From: Patrick Durusau <pdurusau@emory.edu>
Date: Fri, 11 Oct 2002 05:51:53 -0400
Cc: John Cowan <jcowan@reutershealth.com>, "W. E. Perry" <wperry@fiduciary.com>, LMNL-DEV <lmnl-dev@lmnl.org>, Matthew Brook O'Donnell <mtrout@nycap.rr.com>, xml-dev@lists.xml.org
References: <200210101305.JAA29379@mail2.reutershealth.com> <19514849372.20021010140724@jenitennison.com> <3DA58333.7080001@emory.edu> <19218876953.20021010151432@jenitennison.com>
Reply-to: pdurusau@emory.edu
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4) Gecko/20011128 Netscape6/6.2.1

Jeni,

Jeni Tennison wrote:

>Hi Patrick,
>
<snip>

>
>
>I'd be *very* careful about drawing any conclusions about speed up
>from these observations. What you've done for these observations is
>replace markup-significant characters (e.g. '<') with
>markup-insignificant characters (i.e. '@'), effectively turning whole
>regions of the document into plain text.
>
I said in my post that these were observations that suggest further 
investigation. The replacement was noted on the webpage as simulating 
the result of a JITTs parser. Yes, the operation of a JITTs parser would 
be to treat regions of the document into plain text. Sorry if that was 
not explicit in our earlier treatments of JITTs parsing. 

>
>I'm certainly no parser writer, but having hacked around the internals
>of Aelfred in order to create a parser for LMNL [1], I can see why
>this would cause a speed-up -- all a parser has to do to process these
>sections is scan through the text until it comes to a '<'; that
>scanning process is very fast.
>
>My understanding of JITTs is that you'd need a lot more sophistication
>in the parser. 
>
No.

>It wouldn't be enough to just ignore all the tags that
>the parser came across (which is what you've done in effect). Instead,
>the parser would have to read the tag, look at the name of the tag,
>check that against a list (from a DTD or schema) in order to work out
>what to do, and then either generate a "start/endElement" event or
>generate a "characters" event (to report the tag as a string)
>depending on the tag's status. If anything, I imagine that this will
>*add* time to the parsing of the document.
>
Parsers already build a tree from the DTD or schema in order to 
"recognize" the markup it encounters in the document. All JITTs would 
require is in the lookup step, where a parser now looks for the token in 
the tree is that upon failure, the parser starts reading input again. 
(That assumes you are using the suggested ignore option, with delete, it 
would drop the token from the imput string and continue reading input.)

<snip>

>
>In other words, it's certainly the case that JITTs has promise in
>terms of reducing DOM size and speeding up processing, but I don't
>think that these observations are really an accurate representation of
>what the effect will be. I don't know what you're planning for your
>next step, but a good demonstration, in my opinion, would be to show
>how the same XSLT transformation, say, is speeded up by working on a
>JITTs DOM compared to a normal DOM.
>
There are a number of demonstrations in the planning and XSLT will 
probably be one of the earlier ones.

Patrick

-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
pdurusau@emory.edu

Follow-Ups:
- Re: [xml-dev] JITTs and DOM
  - From: Jeni Tennison <jeni@jenitennison.com>

References:
- JITTs and DOM
  - From: Patrick Durusau <pdurusau@emory.edu>
- Re: [xml-dev] JITTs and DOM
  - From: Jeni Tennison <jeni@jenitennison.com>

Prev by Date: Re: [xml-dev] JITTs and DOM
Next by Date: RE: [xml-dev] Processing huge XML files
Previous by thread: Re: [xml-dev] JITTs and DOM
Next by thread: Re: [xml-dev] JITTs and DOM
Index(es):
- Date
- Thread