xml-dev - Re: [xml-dev] JITTs and DOM

Re: [xml-dev] JITTs and DOM

[ Lists Home | Date Index | Thread Index ]

To: gtn@rbii.com
Subject: Re: [xml-dev] JITTs and DOM
From: Patrick Durusau <pdurusau@emory.edu>
Date: Sat, 12 Oct 2002 07:56:18 -0400
Cc: LMNL-DEV <lmnl-dev@lmnl.org>, xml-dev@lists.xml.org
References: <200210101305.JAA29379@mail2.reutershealth.com> <3DA69F39.3040302@emory.edu> <6196993018.20021011125630@jenitennison.com> <E1800rd-0002Td-00@server2000.ebizhostingsolutions.com>
Reply-to: pdurusau@emory.edu
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4) Gecko/20011128 Netscape6/6.2.1

Gavin,

Thanks for the great analysis while I was away at the TEI meeting!

One point of difference though:

Gavin Thomas Nicol wrote:

>Part of the value of ARA is that it was explicitly design to support parallel 
>parsing of documents. I'm not sure that JITT can be used in quite the same 
>same way... or at least it'd be more complex because the implicit assumption 
>is that you are operating in the context of a tree.
>
I am not sure what measure you are using for "complex" but let me 
describe how JITTs would operate for parallel processing as compared to 
my understanding of ARA. (Despite Gavin and I disagreeing on some of 
points (both major and minor), ARA is a very interesting bit of work and 
I am looking forward to his next paper on this topic.)

JITTs and parallel parsing:

Usual case is the user seeking one view of a document so I don't see 
JITTs being more complex than usual document parsing in that regard.

Performance gains are seen when the user wants to build partial DOM 
trees and fully parse fragments of that tree for presentation or other uses.

In our investigations of overlapping texts, it appears that most overlap 
is what we characterized as "localized" and hence, one need only parse a 
fragment in the alternate hierarchy to compare the alternative 
hierarchies.(Localized is a term we need to define more precisely but 
the idea is that a phrase may be a member of a sentence element and a 
line element that cross (in traditional terminology) and yet share the 
save text/div/p hierarchy further up the tree. The notion is not 
original with us but is found in Earley's treatment of this topic in the 
early 70's. It would be helpful if a measure of localized overlap could 
be developed to assist in assessment of parallel document parsing.)

Thus, assuming that I am using a string search for "That's all folks" 
and find it in the first hierarchy, all I need do is search for the same 
phrase in the alternative hierarchy and obtain the node and its parent 
node where the string occurs.

The situation becomes more complex if the string appears in a different 
document ordering, such as versioning where a <p> element has changed 
its physical location but I am working on some thoughts about how to 
handle that in the JITTs model.

ARA and parallel parsing:

(These are largely assumptions based upon my reading of Gavin's paper 
from Extreme so possibly inaccurate or incorrect.)

ARA parallel parses the entire document in order to build its internal 
representation of the ranges in the document.

In some sense that is not complex, but it certainly poses a certain 
overhead to using the ARA approach. Once the entire document has been 
processed, I would expect querying of the ranges to be quite fast. That 
would not be a drawback with largely static documents and versions of 
documents, but could pose problems with documents and sets of documents 
that are not fairly stable. This is not a complexity issue per se, but 
one of system overhead. (Note that HyTime poses the same solution to 
building multiple parses of a single document to implement traditional 
CONCUR. Note 533 of ISO 10744. Not sure how ARA differs from that 
approach other than in the data format for the information.)

Hope the weekend is going well!

More meetings today! ;-) Well, they are TEI meetings and so tend to be 
more lively than your typical academic department meeting although less 
lively than a beach party. Somewhere in between. ;-)

Patrick 

-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
pdurusau@emory.edu

Follow-Ups:
- Re: [xml-dev] JITTs and DOM
  - From: Gavin Thomas Nicol <gtn@rbii.com>

References:
- Re: [xml-dev] JITTs and DOM
  - From: Patrick Durusau <pdurusau@emory.edu>
- Re: [xml-dev] JITTs and DOM
  - From: Jeni Tennison <jeni@jenitennison.com>
- Re: [xml-dev] JITTs and DOM
  - From: Gavin Thomas Nicol <gtn@rbii.com>

Prev by Date: RE: [xml-dev] [ANN] Online W3C XML Schema Validator
Next by Date: Re: [xml-dev] JITTs and DOM
Previous by thread: Re: [xml-dev] JITTs and DOM
Next by thread: Re: [xml-dev] JITTs and DOM
Index(es):
- Date
- Thread