OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Partial documents in tree-based APIs

[ Lists Home | Date Index | Thread Index ]

From: "Michael Kay" <michael.h.kay@ntlworld.com>


> I would be much more interested in any innovation that allowed a parser
> to report more that one well-formedness error in a single run.

Does XML 1.0 anywhere actually state that parsing happens starting
from the beginning?   :-)

A madman could parse starting from the end. Interestingly, it would 
give you a different result from most XML parsers with:

<x>
<!--
xxx
-->
yyy
--->
</x>

Now you could parse starting from the end, but still get the same result
as parsing from the start, by being a little more clever. For example, you 
find a -->, then you search for a <!-- unless you find
an intervening -->, in which case you parse (backwards or forward)
the intervening text.  A progressive cha-cha. 

Actually, my editor has a small backwards parser in it: because we cannot
guarantee that we are working on a well-formed tree, when you want to
close the current element, we backwards parse to find what the context is.  
This is quite a useful technique for avoiding building a DOM (or for if you 
are working with pre-WF documents), however it has a worst-case 
performance penalty if you attempt to be too faithful to simulating a
forward parsing. 

So you can have errors from a forward parse and combine them
with errors from a backwards parser. For example,
say we had the text  "XYZ" and try to WF check it: a forwards
parser might say "X not allowed here" and a backwards parser
might say "Y not allowed here".   

In any case, there are many WF errors that a forwards parser can recover
from: for example a missing entity reference close delimiter
or a strange character in a name.  One of the differences with
writing a streaming parser and a checkpointing incremental parser 
(such as an editor uses) is that the checkpointing parser almost
every legitimate state requires a corresponding error state
and/or recovery state: not only do you have to parse, but you
also have to cope with containing errors to just around where
they occur.


Cheers
Rick Jelliffe




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS