[
Lists Home |
Date Index |
Thread Index
]
From: "S Woodside" <sbwoodside@yahoo.com>
> Probably you are specifying the output to be encoded in UTF-8 or
> something like that where the character is supported in the encoding.
I don't think so. The data goes wrong coming into the XML processor. The character
references are supposed to be for various kinds of quotes, but the numbers are not the
Unicode Numbers. If the characters get though, it will only be by accident. If the output
encoding is set to UTF-8, for example, then ’ will produce two bytes.
(The case where it will *seem to* work is if the output encoding passes throught the C1
characters to the same bytes: for example a ISO8559-1 transcoder. Then if the output is
then read using CP1252 the characters will come out.)
Bad systems are easy. Fragile, slack, and out of control. Better to make the
character reference be for the correct Unicode characters so that the XML coming
in is correct. Then make sure the XML coming out is correct.
Also avoid debugging character encodings of generated HTML using a browser: they
can guess or do all sorts of things (depending on the generation, brand and settings): use
any hex or text editor that lets you select encodings or which understands the XML
encoding header. Using a browser to figure out what is happening with encodings is
the surest road to insanity.
To see what the character references should be, see
http://www.alanwood.net/demos/ansi.html
Instead of the numbers in the "ANSI" column, use the (decimal) numbers in the
"Unicode" column.
Cheers
Rick Jelliffe
|