OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: Expat & utf-16

[ Lists Home | Date Index | Thread Index ]
  • From: Sarveshwar Rao Duddu <duddu@vsnl.com>
  • To: XML-Dev Mailing list <xml-dev@xml.org>
  • Date: Wed, 30 Aug 2000 09:29:12 +0530

Hi,

I have had so many problems with the xerces parser that I dont think
it can be considered well-tested. I will mention some of them and
will leave the decision as to whether it can be considered as well tested
to the reader.

1. Try giving the parser as input a document containing "<hello".
    This document is neither valid nor well-formed.
    The parser should report an error.
    It neither says there is an input error nor (erroneously) says
    it is valid. Instead, it just hangs!!!!   Small parsers of a few KB size
    get this right.

2. The parser seems not to care much about encodings. Tell the
    parser that the encoding is UTF8 and give it a document containing
    invalid UTF8 characters (FF for example) or invalid code sequences
    in UTF8 and it will never report an error. Again, small parsers of a
    few KB size get this right. There are solutions though for this problem.
    If one is using java, they can use byte to char converters and feed the
    output to xerces. The java converter reports an error in such a case.
    I also found that the parser does not respect the byte order mark.

3. The parser doesnot respect the "Declarative Content Model" part of the xml
spec.
    I strongly suspect that the parser checks if the DTD matches the document
rather
    than checking if the document matches the DTD.

There are many other "small" problems with the parser. So I changed my parser.
UTF-16 is also supported by oracle parser. And may be there are bugs in that
too,
but definetely not the above.

Regards,
Sarveshwar


Jon Smirl wrote:

> Xerces, http://xml.apache.org, is UTF-16. It's actively under development
> and well tested. It supports a lot more character encodings than expat too.
>
> Jon Smirl
> jonsmirl@mediaone.net






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS