XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Did you know that the lowercase of the Kelvin Sign(K) is the Latin small letter k? Do you know the impact of that?


On 28/01/2013 19:21, Costello, Roger L. wrote:
> Hi Folks,
>
> The Kelvin Sign (K) is high up in the Unicode code space, it is codepoint U+212A. That's way up there.
>
> Compare with the Latin capital letter K, its codepoint is U+004B. That's way down there.
>
> Interestingly, the lowercase of the Kelvin Sign is the Latin small letter k:
>
> 	lower-case(&#x212A) = 'k'
>
> "So what's the big deal?" you ask. Actually, it's a really big deal. Let me explain.
>
> Suppose you want to enforce this rule in your XML instance documents:
>
>      	The value of the <Name> element must
>      	be 'Lockhart' (lowercase, uppercase, any
>      	case).
I don't know what you mean by your rule. What are your rules for 
case-blind equivalence, if they aren't the Unicode rule? Do you know 
better than the collective wisdom of the Unicode consortium what 
lowercase and uppercase mean?. If you don't like the Unicode rules for 
this, you must propose and justify an alternative.
> Question: Are there other characters similar to the Kelvin Sign? That is, are there other characters that are outside [A-Za-z]  but when lower-case() or upper-case() is applied to them they are inside [A-Za-z]?
>
Not very many, as it happens. The only examples I found in Unicode 4.0.0 
(I haven't checked later versions) are:

<char code="0130" name="LATIN CAPITAL LETTER I WITH DOT ABOVE"/>
<char code="0131" name="LATIN SMALL LETTER DOTLESS I"/>
<char code="017F" name="LATIN SMALL LETTER LONG S"/>
<char code="212A" name="KELVIN SIGN"/>

The dotted/dotless I problem affects Turkish in particular, where 
dotless small i is the normal lower-case counterpart to dotless capital 
I. The "long S" is of course simply an archaic variant of the modern "s".

Michael Kay
Saxonica
>
>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS