[
Lists Home |
Date Index |
Thread Index
]
From: "Michael Kay" <michael.h.kay@ntlworld.com>
> I would be much more interested in any innovation that allowed a parser
> to report more that one well-formedness error in a single run.
Does XML 1.0 anywhere actually state that parsing happens starting
from the beginning? :-)
A madman could parse starting from the end. Interestingly, it would
give you a different result from most XML parsers with:
<x>
<!--
xxx
-->
yyy
--->
</x>
Now you could parse starting from the end, but still get the same result
as parsing from the start, by being a little more clever. For example, you
find a -->, then you search for a <!-- unless you find
an intervening -->, in which case you parse (backwards or forward)
the intervening text. A progressive cha-cha.
Actually, my editor has a small backwards parser in it: because we cannot
guarantee that we are working on a well-formed tree, when you want to
close the current element, we backwards parse to find what the context is.
This is quite a useful technique for avoiding building a DOM (or for if you
are working with pre-WF documents), however it has a worst-case
performance penalty if you attempt to be too faithful to simulating a
forward parsing.
So you can have errors from a forward parse and combine them
with errors from a backwards parser. For example,
say we had the text "XYZ" and try to WF check it: a forwards
parser might say "X not allowed here" and a backwards parser
might say "Y not allowed here".
In any case, there are many WF errors that a forwards parser can recover
from: for example a missing entity reference close delimiter
or a strange character in a name. One of the differences with
writing a streaming parser and a checkpointing incremental parser
(such as an editor uses) is that the checkpointing parser almost
every legitimate state requires a corresponding error state
and/or recovery state: not only do you have to parse, but you
also have to cope with containing errors to just around where
they occur.
Cheers
Rick Jelliffe
|