xml-dev - Re: [xml-dev] Partial documents in tree-based APIs

Re: [xml-dev] Partial documents in tree-based APIs

[ Lists Home | Date Index | Thread Index ]

To: <xml-dev@lists.xml.org>
Subject: Re: [xml-dev] Partial documents in tree-based APIs
From: "Rick Jelliffe" <ricko@allette.com.au>
Date: Tue, 8 Apr 2003 21:35:01 +1000
References: <000701c2fbae$7dc9e5a0$6401a8c0@pcukmka>

From: "Michael Kay" <michael.h.kay@ntlworld.com>


> I would be much more interested in any innovation that allowed a parser
> to report more that one well-formedness error in a single run.

Does XML 1.0 anywhere actually state that parsing happens starting
from the beginning?   :-)

A madman could parse starting from the end. Interestingly, it would 
give you a different result from most XML parsers with:

<x>
<!--
xxx
-->
yyy
--->
</x>

Now you could parse starting from the end, but still get the same result
as parsing from the start, by being a little more clever. For example, you 
find a -->, then you search for a <!-- unless you find
an intervening -->, in which case you parse (backwards or forward)
the intervening text.  A progressive cha-cha. 

Actually, my editor has a small backwards parser in it: because we cannot
guarantee that we are working on a well-formed tree, when you want to
close the current element, we backwards parse to find what the context is.  
This is quite a useful technique for avoiding building a DOM (or for if you 
are working with pre-WF documents), however it has a worst-case 
performance penalty if you attempt to be too faithful to simulating a
forward parsing. 

So you can have errors from a forward parse and combine them
with errors from a backwards parser. For example,
say we had the text  "XYZ" and try to WF check it: a forwards
parser might say "X not allowed here" and a backwards parser
might say "Y not allowed here".   

In any case, there are many WF errors that a forwards parser can recover
from: for example a missing entity reference close delimiter
or a strange character in a name.  One of the differences with
writing a streaming parser and a checkpointing incremental parser 
(such as an editor uses) is that the checkpointing parser almost
every legitimate state requires a corresponding error state
and/or recovery state: not only do you have to parse, but you
also have to cope with containing errors to just around where
they occur.


Cheers
Rick Jelliffe

References:
- RE: [xml-dev] Partial documents in tree-based APIs
  - From: "Michael Kay" <michael.h.kay@ntlworld.com>

Prev by Date: RE: [xml-dev] schema to require than an element have *no* text
Next by Date: PSVI
Previous by thread: Re: [xml-dev] Partial documents in tree-based APIs
Next by thread: Re: [xml-dev] Partial documents in tree-based APIs
Index(es):
- Date
- Thread