Lists Home |
Date Index |
- From: Sarveshwar Rao Duddu <email@example.com>
- To: XML-Dev Mailing list <firstname.lastname@example.org>
- Date: Wed, 30 Aug 2000 09:29:12 +0530
I have had so many problems with the xerces parser that I dont think
it can be considered well-tested. I will mention some of them and
will leave the decision as to whether it can be considered as well tested
to the reader.
1. Try giving the parser as input a document containing "<hello".
This document is neither valid nor well-formed.
The parser should report an error.
It neither says there is an input error nor (erroneously) says
it is valid. Instead, it just hangs!!!! Small parsers of a few KB size
get this right.
2. The parser seems not to care much about encodings. Tell the
parser that the encoding is UTF8 and give it a document containing
invalid UTF8 characters (FF for example) or invalid code sequences
in UTF8 and it will never report an error. Again, small parsers of a
few KB size get this right. There are solutions though for this problem.
If one is using java, they can use byte to char converters and feed the
output to xerces. The java converter reports an error in such a case.
I also found that the parser does not respect the byte order mark.
3. The parser doesnot respect the "Declarative Content Model" part of the xml
I strongly suspect that the parser checks if the DTD matches the document
than checking if the document matches the DTD.
There are many other "small" problems with the parser. So I changed my parser.
UTF-16 is also supported by oracle parser. And may be there are bugs in that
but definetely not the above.
Jon Smirl wrote:
> Xerces, http://xml.apache.org, is UTF-16. It's actively under development
> and well tested. It supports a lot more character encodings than expat too.
> Jon Smirl