XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] A SAX TransformerHandler encoding question

Rick Jelliffe wrote:
> Is your problem that the surrogate characters are both serialized to 
> their two UTF-8 equivalents, or that there is a read problem (which is 
> the reported issue in the link you mention.)
 >
> It should not be surprising if Java Characters are serialized 
> independently. You may find that there is some normalization library, 
> such as ICU, that can help by transcoding through 32 bit characters, but 
> I think you have to set your expectations that using the surrogates is 
> still, to an extent, pioneering.
> 
> Cheers
> Rick Jelliffe

Hi Rick,

thanks for the feedback.

I think it is the same problem.

In the meantime I have verified that the program generates correct 
output when run under jdk 1.6. So what I'd like to understand is why a 
bug causing broken XML serialization doesn't get fixed in previous JDKs 
(at least 1.5, I guess).

Best regards, Julian

PS1: the posted source had the surrogate pair broken; correct version:

   public static void testOut() throws Exception {
     ByteArrayOutputStream out = new ByteArrayOutputStream();
     SAXTransformerFactory stf = (SAXTransformerFactory) 
SAXTransformerFactory.newInstance();

     TransformerHandler th = stf.newTransformerHandler();
 
th.getTransformer().setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, 
"yes");
     th.setResult(new StreamResult(out));

     th.startDocument();
     th.startElement("", "foo", "foo", new AttributesImpl());
     char c[] = "\ud800\udc00".toCharArray();
     th.characters(c, 0, c.length);
     th.endElement("", "foo", "foo");
     th.endDocument();

     byte bytes[] = out.toByteArray();

     for (int i = 0; i < bytes.length; i++) {
       System.out.println(i + ": " + bytes[i] + " " + ((char)bytes[i]));
     }
   }


PS2: output for JDK 1.6:

0: 60 <
1: 102 f
2: 111 o
3: 111 o
4: 62 >
5: 38 &
6: 35 #
7: 54 6
8: 53 5
9: 53 5
10: 51 3
11: 54 6
12: 59 ;
13: 60 <
14: 47 /
15: 102 f
16: 111 o
17: 111 o
18: 62 >


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS