[
Lists Home |
Date Index |
Thread Index
]
Elliotte Rusty Harold wrote:
> Now consider the case of a tree-based API such as DOM, JDOM, or XOM
> which encounters a malformedness error. Traditionally, these APIs have
> reported no information from a malformed document to the client
> application. However, recently Laurent Bihanic submitted a patch to JDOM
> in which as much of the document as had been able to be successfully
> parsed was made available through the exception that was thrown to
> indicate the malformedness error. This was quite clever. It had never
> occurred to me, and I had never noticed any other API do anything similar.
>
> What I'd like to get broader discussion of is whether this is a good
> idea. There are certainly use cases for it.
I think it is supported by libxml2 (or at least, it is by the Perl wrapper
XML::LibXML) through a "recover" option. I don't know the actual details but I
think it parses as much as it can and renders that as a DOM. I wanted to use it
once to save a couple thousand documents from a variety of errors produced by
the generating tool, but unfortunately all the errors were on or very close to
the root element, so I had to regex my way out of it instead. I certainly
would've found that option useful if it had been possible to recover more fully
from the errors I was getting.
> Is this approach something to be encouraged? Should other tree-based
> APIs like XOM and DOM copy this innovation? What advantages and
> disadvantages have I not thought of?
I don't see disadvantages given that it throws an exception anyway. However that
approach does have the drawback that you're not getting all the (corrupted) data
back, just all that precedes the corruption. I'm currently working on a SAX
parser wrapped around an HTML parser to try to provide a way (generic to any
XML, unlike TagSoup which is incredibly useful but currently targetted at HTML)
to recover bad XML. I believe that non-WF XML sadly enough still happens too
often and providing users with tools to recover from such situations is
definitely helpful.
--
Robin Berjon <robin.berjon@expway.fr>
Research Engineer, Expway http://expway.fr/
7FC0 6F5F D864 EFB8 08CE 8E74 58E6 D5DB 4889 2488
|