[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Error and Fatal Error
- From: Peter Flynn <peter@silmaril.ie>
- To: xml-dev@lists.xml.org
- Date: Sun, 17 Jul 2011 23:32:13 +0100
On 16/07/11 18:17, Stephen D Green wrote:
> Absolutely.
> �
> I hunted around the .Net framework hoping to find such
> a parser which allowed me to repair the XML but I couldn't
> find one.
I suspect it's more the case that such a program is really a "tool", not
a "parser" per se. As such it would of course *contain* a parser, and be
capable of parsing an XML document, and could therefore quite correctly
be described as "containing a conformant XML parser".
It's what it would do (or let you do) with the (possibly mangled)
*results* of the parse that would differentiate it from the traditional
parser/validator.
IMHE only relatively trivial (usually single-character) errors can be
corrected on-the-fly, such as
* mistyped element type names, attribute names, or token-list
values;
* missing or extra attribute quotes, ampersands, or pointy brackets;
* bogus or garbled characters resulting in or from a character-
encoding error.
These tend to happen because the document has been hand-corrected
without using a conformant editor -- still a frighteningly common
occurrence. In these cases there is often more than one such error
present, except in very short documents, and it is thus often better to
handle the document in a suitable editor with good error-reporting and
the robustness to cope with partially-marked or invalid documents.
Any error more complex than these, such as those where entire subtrees
of the structure are misplaced in the markup framework, or where a
persistent disruption of the syntax causes a cascade of errors, can
really only be dealt with by opening the broken document in an editor
and fixing it (or by regenerating it, as appropriate).
> I think we need a parser which understands the
> slightly erroneous XML and can find any errors in it:
> In short we need a parser which has an API which
> can allow the web developer (in this case with .NET)
> to repair XML.
I'm not entirely convinced that a parser-with-editorial-cleanup would be
significantly more use for this purpose than the standard
editor-with-builtin-parser model. But I can well understand the
attraction of wanting to cope inline with the kind of garbage most users
blithely paste into text fields in web-based applications, fondly
imagining that the Elves will automagically fix their crud into XML.
(I a few circumstances I am in the very fortunate position of being able
to send it back to them and tell them to fix it, because we have very
strict rules about this, and the penalty for disobedience is that their
web page or document simply won't be published until they send us good
data. But that is a luxury that I can justify by having cut the cost of
cleanup and error to virtually zero by dint of a lot of user training in
how to avoid creating crud in the first place. Unfortunately that's a
long-term strategy that most companies won't even consider; not because
of the up-front cost, which is high but not unaffordable; but because it
shows up their internal quality controls to be useless to the point of
non-existence, and that embarrasses the senior people.)
///Peter
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]