[
Lists Home |
Date Index |
Thread Index
]
Gavin,
Thanks for the great analysis while I was away at the TEI meeting!
One point of difference though:
Gavin Thomas Nicol wrote:
>Part of the value of ARA is that it was explicitly design to support parallel
>parsing of documents. I'm not sure that JITT can be used in quite the same
>same way... or at least it'd be more complex because the implicit assumption
>is that you are operating in the context of a tree.
>
I am not sure what measure you are using for "complex" but let me
describe how JITTs would operate for parallel processing as compared to
my understanding of ARA. (Despite Gavin and I disagreeing on some of
points (both major and minor), ARA is a very interesting bit of work and
I am looking forward to his next paper on this topic.)
JITTs and parallel parsing:
Usual case is the user seeking one view of a document so I don't see
JITTs being more complex than usual document parsing in that regard.
Performance gains are seen when the user wants to build partial DOM
trees and fully parse fragments of that tree for presentation or other uses.
In our investigations of overlapping texts, it appears that most overlap
is what we characterized as "localized" and hence, one need only parse a
fragment in the alternate hierarchy to compare the alternative
hierarchies.(Localized is a term we need to define more precisely but
the idea is that a phrase may be a member of a sentence element and a
line element that cross (in traditional terminology) and yet share the
save text/div/p hierarchy further up the tree. The notion is not
original with us but is found in Earley's treatment of this topic in the
early 70's. It would be helpful if a measure of localized overlap could
be developed to assist in assessment of parallel document parsing.)
Thus, assuming that I am using a string search for "That's all folks"
and find it in the first hierarchy, all I need do is search for the same
phrase in the alternative hierarchy and obtain the node and its parent
node where the string occurs.
The situation becomes more complex if the string appears in a different
document ordering, such as versioning where a <p> element has changed
its physical location but I am working on some thoughts about how to
handle that in the JITTs model.
ARA and parallel parsing:
(These are largely assumptions based upon my reading of Gavin's paper
from Extreme so possibly inaccurate or incorrect.)
ARA parallel parses the entire document in order to build its internal
representation of the ranges in the document.
In some sense that is not complex, but it certainly poses a certain
overhead to using the ARA approach. Once the entire document has been
processed, I would expect querying of the ranges to be quite fast. That
would not be a drawback with largely static documents and versions of
documents, but could pose problems with documents and sets of documents
that are not fairly stable. This is not a complexity issue per se, but
one of system overhead. (Note that HyTime poses the same solution to
building multiple parses of a single document to implement traditional
CONCUR. Note 533 of ISO 10744. Not sure how ARA differs from that
approach other than in the data format for the information.)
Hope the weekend is going well!
More meetings today! ;-) Well, they are TEI meetings and so tend to be
more lively than your typical academic department meeting although less
lively than a beach party. Somewhere in between. ;-)
Patrick
--
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
pdurusau@emory.edu
|