Lists Home |
Date Index |
On Friday 11 October 2002 07:56 am, Jeni Tennison wrote:
> I understood that the *output* would be plain text, but I thought that
> the *input* would be marked-up text. This wasn't the case in the
> samples that you were using for your observations. I did see that you
> characterised them as "observations" and said that you would do more
> investigation, I just didn't want you or anyone else to get too
> hopeful about 30x speedup on the basis of these particular
The samples are a demonstration only in that they create smallish DOM trees
from potentially largish ones (i.e. certain markup is supressed). This would
be like putting a SAX filter inline with structural stop lists (if you will).
As I noted, the speedup is most likely because the cost of building a DOM is
usually in object construction, rather than in parsing.
> The other thing that I think is promising about the JITTs approach is
> the ability to parse just the bits of the document that you're
> interested in, on the fly, during processing. A DOM implementation
> that did this behind the scenes could be very effective. (I'm sure
> that native XML databases / content management systems do this kind of
> thing all the time; I don't know if any in-memory DOM implementations
> do, or if it's been tried and for some reason rejected?)
This has been done (laxy evaluation) in the past. From what I saw, the
performance gain wasn't worth the complexity.
Part of the value of ARA is that it was explicitly design to support parallel
parsing of documents. I'm not sure that JITT can be used in quite the same
same way... or at least it'd be more complex because the implicit assumption
is that you are operating in the context of a tree.