Re: [xml-dev] Error and Fatal Error

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Peter Flynn <peter@silmaril.ie>
To: xml-dev@lists.xml.org
Date: Sun, 17 Jul 2011 23:32:13 +0100

On 16/07/11 18:17, Stephen D Green wrote:
> Absolutely.
> �
> I hunted around the .Net framework hoping to find such
> a parser which allowed me to repair the XML but I couldn't
> find one.

I suspect it's more the case that such a program is really a "tool", not 
a "parser" per se. As such it would of course *contain* a parser, and be 
capable of parsing an XML document, and could therefore quite correctly 
be described as "containing a conformant XML parser".

It's what it would do (or let you do)  with the (possibly mangled) 
*results* of the parse that would differentiate it from the traditional 
parser/validator.

IMHE only relatively trivial (usually single-character) errors can be 
corrected on-the-fly, such as

  * mistyped element type names, attribute names, or token-list
    values;

  * missing or extra attribute quotes, ampersands, or pointy brackets;

  * bogus or garbled characters resulting in or from a character-
    encoding error.

These tend to happen because the document has been hand-corrected 
without using a conformant editor -- still a frighteningly common 
occurrence. In these cases there is often more than one such error 
present, except in very short documents, and it is thus often better to 
handle the document in a suitable editor with good error-reporting and 
the robustness to cope with partially-marked or invalid documents.

Any error more complex than these, such as those where entire subtrees 
of the structure are misplaced in the markup framework, or where a 
persistent disruption of the syntax causes a cascade of errors, can 
really only be dealt with by opening the broken document in an editor 
and fixing it (or by regenerating it, as appropriate).

 > I think we need a parser which understands the
 > slightly erroneous XML and can find any errors in it:
 > In short we need a parser which has an API which
 > can allow the web developer (in this case with .NET)
 > to repair XML.

I'm not entirely convinced that a parser-with-editorial-cleanup would be 
significantly more use for this purpose than the standard 
editor-with-builtin-parser model. But I can well understand the 
attraction of wanting to cope inline with the kind of garbage most users 
blithely paste into text fields in web-based applications, fondly 
imagining that the Elves will automagically fix their crud into XML.

(I a few circumstances I am in the very fortunate position of being able 
to send it back to them and tell them to fix it, because we have very 
strict rules about this, and the penalty for disobedience is that their 
web page or document simply won't be published until they send us good 
data. But that is a luxury that I can justify by having cut the cost of 
cleanup and error to virtually zero by dint of a lot of user training in 
how to avoid creating crud in the first place. Unfortunately that's a 
long-term strategy that most companies won't even consider; not because 
of the up-front cost, which is high but not unaffordable; but because it 
shows up their internal quality controls to be useless to the point of 
non-existence, and that embarrasses the senior people.)

///Peter

References:
- Error and Fatal Error
  - From: Joe Fawcett <joefawcett@hotmail.com>
- Re: [xml-dev] Error and Fatal Error
  - From: Stephen D Green <stephengreenubl@gmail.com>
- Re: [xml-dev] Error and Fatal Error
  - From: Michael Kay <mike@saxonica.com>
- Re: [xml-dev] Error and Fatal Error
  - From: Stephen D Green <stephengreenubl@gmail.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]