[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
A SAX TransformerHandler encoding question
- From: Julian Reschke <julian.reschke@gmx.de>
- To: XML Developers List <xml-dev@lists.xml.org>
- Date: Fri, 27 Apr 2007 17:46:37 +0200
Hi,
I've get some interesting problems with JDK's (1.4 and 1.5)
TransformerHandler and surrogate pairs...:
Consider:
public void testOut() throws Exception {
ByteArrayOutputStream out = new ByteArrayOutputStream();
SAXTransformerFactory stf = (SAXTransformerFactory)
SAXTransformerFactory.newInstance();
TransformerHandler th = stf.newTransformerHandler();
th.getTransformer().setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,
"yes");
th.setResult(new StreamResult(out));
th.startDocument();
th.startElement("", "foo", "foo", new AttributesImpl());
char c[] = "\udc00\ud800".toCharArray();
th.characters(c, 0, c.length);
th.endElement("", "foo", "foo");
th.endDocument();
byte bytes[] = out.toByteArray();
for (int i = 0; i < bytes.length; i++) {
System.out.println(i + ": " + bytes[i] + " " + ((char)bytes[i]));
}
}
This yields:
0: 60 <
1: 102 f
2: 111 o
3: 111 o
4: 62 >
5: -19 ?
6: -80 ?
7: -128 ?
8: -19 ?
9: -96 ?
10: -128 ?
11: 60 <
12: 47 /
13: 102 f
14: 111 o
15: 111 o
16: 62 >
That is, the surrogate pair has been serialized as two separate unicode
characters. It seems that this problem is old (see
<http://issues.apache.org/jira/browse/XALANJ-2132>), so why does it
still occur in recent JDKs?
Best regards, Julian
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]