[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Quiz: How do you put a Euro sign in your data if yourXML uses windows-1252 encoding and you use a numeric character reference?
- From: Michael Sokolov <msokolov@safaribooksonline.com>
- To: Michael Kay <mike@saxonica.com>
- Date: Fri, 01 Mar 2013 07:49:22 -0500
On 3/1/2013 6:36 AM, Michael Kay wrote:
>
> On 01/03/2013 06:30, David Lee wrote:
>> Curious .. Is this a common misconception ?
>> How prevenant is the confusion between xml encoding and the infoset
>> or XDM character model of Unicode codepoints? Encoding charset !=
>> Unicode codepoints. Simple!!!!!???
>>
>> I hinted at this months ago on this list that I believe the level of
>> misunderstanding of encoding and Unicode concepts is both high and
>> not self recognized. Which is a deadly combination.
>> Is there more "the community" can do to make it clearer?
>>
>
> If there is, please let me know.
>
> I've been advising people how to solve character encoding issues for
> about 100 years, but our own internal system for handling Saxon
> license requests still gets it wrong. It ain't easy.
The advice I always give is: use (and demand) UTF-8 everywhere and
anywhere that you can. Don't use named entities ever (actually this has
nothing to do with character sets, but it's still my position :)). Use
numerical entities only when it is absolutely necessary. Remember that
if you use multiple character sets (or accept data from outside that may
be in unknown or ill-defined encodings), you may have complicated
problems arise in almost any layer of your software stack. Problems
still come up (we have an entire category of bugs in one customer's
system related to umlauts), but demanding utf-8 only from data suppliers
has helped to avoid at least some character set translation issues.
-Mike
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]