[
Lists Home |
Date Index |
Thread Index
]
- To: xerces-j-user@xml.apache.org
- Subject: Re: some character entities are now '?'
- From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Date: Wed, 23 Jul 2003 18:03:57 -0400
- Cc: xml-dev@lists.xml.org
- In-reply-to: <OF64F55C22.F67C038A-ON85256D6C.00767F90-85256D6C.007744DC@us.ibm.com>
- References: <OF64F55C22.F67C038A-ON85256D6C.00767F90-85256D6C.007744DC@us.ibm.com>
At 5:41 PM -0400 7/23/03, David M Williams wrote:
Sun bug#4646959
http://developer.java.sun.com/developer/bugParade/bugs/4646959.html
That bug had some helpful info in it
Youch. What nasty bug. And contrary to Sun's claims this is a bug in
Java 1.4. If I'm reading this right, any XML parser using
InputStreamReader to translate UTF-8 into Java strings and chars is
likely to miss malformedness errors that arise from bad UTF-8. This
also seems to affect other character sets as well.
It looks like you could work around this in 1.4 to have
InputStreamReader report the bad data. However, that code would not
be portable back to Java 1.3 (which does not have this bug). Thus
you'd need separate code bases for 1.3 and 1.4 or some really ugly
reflection based code. We really need to get this fixed.
--
Elliotte Rusty Harold
elharo@metalab.unc.edu
Processing XML with Java (Addison-Wesley, 2002)
http://www.cafeconleche.org/books/xmljava
http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA
|