OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: some character entities are now '?'

[ Lists Home | Date Index | Thread Index ]
  • To: xerces-j-user@xml.apache.org
  • Subject: Re: some character entities are now '?'
  • From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
  • Date: Wed, 23 Jul 2003 18:03:57 -0400
  • Cc: xml-dev@lists.xml.org
  • In-reply-to: <OF64F55C22.F67C038A-ON85256D6C.00767F90-85256D6C.007744DC@us.ibm.com>
  • References: <OF64F55C22.F67C038A-ON85256D6C.00767F90-85256D6C.007744DC@us.ibm.com>

At 5:41 PM -0400 7/23/03, David M Williams wrote:

Sun bug#4646959 
 http://developer.java.sun.com/developer/bugParade/bugs/4646959.html
That bug had some helpful info in it


Youch. What nasty bug. And contrary to Sun's claims this is a bug in 
Java 1.4. If I'm reading this right, any XML parser using 
InputStreamReader to translate UTF-8 into Java strings and chars is 
likely to miss malformedness errors that arise from bad UTF-8. This 
also seems to affect other character sets as well.

It looks like you could work around this in 1.4 to have 
InputStreamReader report the bad data. However, that code would not 
be portable back to Java 1.3 (which does not have this bug). Thus 
you'd need separate code bases for 1.3 and 1.4 or some really ugly 
reflection based code. We really need to get this fixed.
-- 

   Elliotte Rusty Harold
   elharo@metalab.unc.edu
   Processing XML with Java (Addison-Wesley, 2002)
   http://www.cafeconleche.org/books/xmljava
   http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS