OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Quiz: How do you put a Euro sign in your data if yourXML uses windows-1252 encoding and you use a numeric character reference?

Curious .. Is this a common misconception ? 
How prevenant is the confusion between xml encoding and the infoset or XDM character model of Unicode codepoints?  Encoding charset != Unicode codepoints.  Simple!!!!!???

I hinted at this months ago on this list that I believe the level of misunderstanding of encoding and Unicode concepts is both high and not self recognized.  Which is a deadly combination.
Is there more "the community" can do to make it clearer?  
Once you understand, it is so obvious yet I see it as extremely common as Roger's example so well exemplifies the common misunderstanding.
It is very frustrating as it seems so obvious to me yet a large number of people I work with over the years are confused ... And worse don't recognize their ignorance so don't look in the right places when things break.   Since this seems so common to me I hesitate to discount this confusion as simple intellectual inability ... Maybe something can be done to educate engineers better on this concept ....  It is as fundamental as binary arithmetic but seems to me to be vastly misunderstood beyond proportion  to the complexity.  Yet I can see no way of making it more obvious.   The concept is well defined and well known.  How can people be confused?  Yet they seem to be overwhelmingly confused so something is wrong somewhere.

Sent from my iPad (excuse the terseness) 
David A Lee

On Feb 28, 2013, at 6:25 PM, "Liam R E Quin" <liam@w3.org> wrote:

> On Thu, 2013-02-28 at 23:00 +0000, Costello, Roger L. wrote:
>>    <?xml version="1.0" encoding="windows-1252"?>
>> In the windows-1252 encoding scheme the Euro sign (€) is hex 80.
>>    &#x80;43.00
> No. Do not do this.
> [...]
>>    Numeric character references (such as &#x80;) 
>>    are interpreted as Unicode characters – no matter 
>>    what encoding you use for your document.
> Right. That is because the numeric character references identify
> codepoints, not input octet sequences.
> Liam
> -- 
> Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
> Pictures from old books: http://fromoldbooks.org/
> Ankh: irc.sorcery.net irc.gnome.org freenode/#xml
> _______________________________________________________________________
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS