[
Lists Home |
Date Index |
Thread Index
]
At 9:26 PM +0000 2/23/04, Michael Kay wrote:
>If we've got to the point where serial reuse of a parser instance
>results in poor error reporting with malformed documents, and this is
>considered a "particularly nasty bug", then I think we're winning.
It wasn't the error reporting that was the problem. XML parsers in
general and SAX parsers in particular are allowed to report content
from before the first well-formedness error in the document. In this
case, Xerces was correctly reporting the error but was incorrectly
reporting the document content.
However, what was nastiest about this bug was how hard to find and
reproduce it was. Bugs that show up every time with a quick unit test
are the first ones fixed. :-) When I first reported it, the Xerces
folks didn't believe me. I had to spend several hours narrowing it
down and determining exactly what combination of conditions led to
the bug. While I was trying to craft a reproducible demo of the bug,
it kept vanishing from sight. At least it wasn't one of those
Heisenbugs that disappears when you turn on the debugger. :-)
Xerces has other problems, though. It's not the most conformant
parser out there, by a long shot. I am a bit worried though that
Xerces increasingly has the field of both SAX and DOM for Java to
itself. Only the Oracle team is still maintaining their own
independent parser. Competition is good, and right now there isn't
much of it when it comes to SAX and DOM in the Java space. Writing a
parser is probably beyond my skills or interest level, but I wonder
how hard it would be to roll a JNI wrapper for libxml?
--
Elliotte Rusty Harold
elharo@metalab.unc.edu
Effective XML (Addison-Wesley, 2003)
http://www.cafeconleche.org/books/effectivexml
http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA
|