[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] A SAX TransformerHandler encoding question
- From: Julian Reschke <julian.reschke@gmx.de>
- To: Rick Jelliffe <rjelliffe@allette.com.au>
- Date: Sat, 28 Apr 2007 12:03:56 +0200
Rick Jelliffe wrote:
> Is your problem that the surrogate characters are both serialized to
> their two UTF-8 equivalents, or that there is a read problem (which is
> the reported issue in the link you mention.)
>
> It should not be surprising if Java Characters are serialized
> independently. You may find that there is some normalization library,
> such as ICU, that can help by transcoding through 32 bit characters, but
> I think you have to set your expectations that using the surrogates is
> still, to an extent, pioneering.
>
> Cheers
> Rick Jelliffe
Hi Rick,
thanks for the feedback.
I think it is the same problem.
In the meantime I have verified that the program generates correct
output when run under jdk 1.6. So what I'd like to understand is why a
bug causing broken XML serialization doesn't get fixed in previous JDKs
(at least 1.5, I guess).
Best regards, Julian
PS1: the posted source had the surrogate pair broken; correct version:
public static void testOut() throws Exception {
ByteArrayOutputStream out = new ByteArrayOutputStream();
SAXTransformerFactory stf = (SAXTransformerFactory)
SAXTransformerFactory.newInstance();
TransformerHandler th = stf.newTransformerHandler();
th.getTransformer().setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,
"yes");
th.setResult(new StreamResult(out));
th.startDocument();
th.startElement("", "foo", "foo", new AttributesImpl());
char c[] = "\ud800\udc00".toCharArray();
th.characters(c, 0, c.length);
th.endElement("", "foo", "foo");
th.endDocument();
byte bytes[] = out.toByteArray();
for (int i = 0; i < bytes.length; i++) {
System.out.println(i + ": " + bytes[i] + " " + ((char)bytes[i]));
}
}
PS2: output for JDK 1.6:
0: 60 <
1: 102 f
2: 111 o
3: 111 o
4: 62 >
5: 38 &
6: 35 #
7: 54 6
8: 53 5
9: 53 5
10: 51 3
11: 54 6
12: 59 ;
13: 60 <
14: 47 /
15: 102 f
16: 111 o
17: 111 o
18: 62 >
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]