[
Lists Home |
Date Index |
Thread Index
]
> Sounds a nice idea independent of any speed benefits. I toyed with the idea
> of parsing XSLT match pattterns backwards at one stage, but didn't pursue
> it. My brain is wired for left-to-right reading...
>
> As a matter of interest, is there any problem with decoding UTF-8 when
> reading backwards?
>
> Shame that the spec doesn't require ">" to be escaped, I imagine this causes
> a fair problem with backtracking, for example how do you cope with
>
> ...... <a><b/><c/></a> -->
>
> which might or might not be part of a comment?
>
> IIRC we were able to read files backwards on ICL VME. That's history, but
> many of its features have reappeared in Windows 25 years later ...
I wrote a reverse parser in pascal once as part of an editor project. It
was designed to determine the current context the user was working in.
In general from a fixed space somewhere in the middle of a document the
amount of branching and caching you had to do was excessive and in
virtually all cases led to scans back to the start of the document. In
Sean's case I imagine there would still be a lot of assumptions on the
parser's part until it reached the crossover position between the two
threads at which point it might issue a fatal error (if the assumptions
turned out to be wrong).
I wonder if you could instead work on a sequential multi-threaded
approach. 1 to handle decoding and chunking of characters another one to
handle parsing of lexical structures and a third (possibly at a driver
level) to handle external WF checks (like checking character classes,
name checks, and duplicate attribute checks). Decoupling these pieces
would allow you to very easily turn off WF checks if you knew that the
document was WF via an out-of-band mechanism.
Cheers,
Jeff Rafter
|