OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] suppression of the transformation of character entities in

[ Lists Home | Date Index | Thread Index ]

From: "S Woodside" <sbwoodside@yahoo.com>

> Probably you are specifying the output to be encoded in UTF-8 or 
> something like that where the character is supported in the encoding. 

I don't think so.  The data goes wrong coming into the XML processor. The character 
references are supposed to be for various kinds of quotes, but the numbers are not the 
Unicode Numbers. If the characters get though, it will only be by accident.  If the output 
encoding is set to UTF-8, for example, then &#146; will produce two bytes.  

(The case where it will *seem to* work is if the output encoding passes throught the C1  
characters to the same bytes: for example a ISO8559-1 transcoder. Then if the output is 
then read using CP1252 the characters will come out.) 

Bad systems are easy. Fragile, slack, and out of control. Better to make the 
character reference be for the correct Unicode characters so that the XML coming
in is correct. Then make sure the XML coming out is correct.  

Also avoid debugging character encodings of generated HTML using a browser: they 
can guess or do all sorts of things (depending on the generation, brand and settings): use 
any hex or text editor that lets you select encodings or which understands the XML 
encoding header. Using a browser to figure out what is happening with encodings is
the surest road to insanity. 

To see what the character references should be, see
    http://www.alanwood.net/demos/ansi.html
Instead of the numbers in the "ANSI" column, use the (decimal) numbers in the
"Unicode" column. 

Cheers
Rick Jelliffe





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS