Lists Home |
Date Index |
Jeni Tennison wrote:
>I'd be *very* careful about drawing any conclusions about speed up
>from these observations. What you've done for these observations is
>replace markup-significant characters (e.g. '<') with
>markup-insignificant characters (i.e. '@'), effectively turning whole
>regions of the document into plain text.
I said in my post that these were observations that suggest further
investigation. The replacement was noted on the webpage as simulating
the result of a JITTs parser. Yes, the operation of a JITTs parser would
be to treat regions of the document into plain text. Sorry if that was
not explicit in our earlier treatments of JITTs parsing.
>I'm certainly no parser writer, but having hacked around the internals
>of Aelfred in order to create a parser for LMNL , I can see why
>this would cause a speed-up -- all a parser has to do to process these
>sections is scan through the text until it comes to a '<'; that
>scanning process is very fast.
>My understanding of JITTs is that you'd need a lot more sophistication
>in the parser.
>It wouldn't be enough to just ignore all the tags that
>the parser came across (which is what you've done in effect). Instead,
>the parser would have to read the tag, look at the name of the tag,
>check that against a list (from a DTD or schema) in order to work out
>what to do, and then either generate a "start/endElement" event or
>generate a "characters" event (to report the tag as a string)
>depending on the tag's status. If anything, I imagine that this will
>*add* time to the parsing of the document.
Parsers already build a tree from the DTD or schema in order to
"recognize" the markup it encounters in the document. All JITTs would
require is in the lookup step, where a parser now looks for the token in
the tree is that upon failure, the parser starts reading input again.
(That assumes you are using the suggested ignore option, with delete, it
would drop the token from the imput string and continue reading input.)
>In other words, it's certainly the case that JITTs has promise in
>terms of reducing DOM size and speeding up processing, but I don't
>think that these observations are really an accurate representation of
>what the effect will be. I don't know what you're planning for your
>next step, but a good demonstration, in my opinion, would be to show
>how the same XSLT transformation, say, is speeded up by working on a
>JITTs DOM compared to a normal DOM.
There are a number of demonstrations in the planning and XSLT will
probably be one of the earlier ones.
Director of Research and Development
Society of Biblical Literature