XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Did you know that the lowercase of the Kelvin Sign (K) is the Latinsmall letter k? Do you know the impact of that?

Hi Folks,

The Kelvin Sign (K) is high up in the Unicode code space, it is codepoint U+212A. That's way up there.

Compare with the Latin capital letter K, its codepoint is U+004B. That's way down there.

Interestingly, the lowercase of the Kelvin Sign is the Latin small letter k:

	lower-case(&#x212A) = 'k'

"So what's the big deal?" you ask. Actually, it's a really big deal. Let me explain.

Suppose you want to enforce this rule in your XML instance documents: 

    	The value of the <Name> element must
    	be 'Lockhart' (lowercase, uppercase, any
    	case).

In XPath the rule can be expressed using the matches() function:

    	matches(Name, 'Lockhart', 'i')

The third argument ('i') means that you want matches to do a "case insensitive match."

So, applying the matches() function to this:

    	<Name>Lockhart</Name>

returns true.

But it also returns true to this (recall that U+212A is the Kelvin Sign):

    	<Name>Loc&#x212Ahart</Name>

Ouch!

The <Name> element contains invalid data but the matches() function claims that it is valid data.

Let's see how this applies to XML Schemas. Here I declare a <Name> element and specify that its value must be 'Lockhart' (case insensitive):

              <xs:element name="Name">
                    <xs:simpleType>
                        <xs:restriction base="xs:string">
                            <xs:assertion test="matches($value, 'Lockhart', 'i')" />
                        </xs:restriction>
                    </xs:simpleType>
                </xs:element>

I then validate this:

	<Name>Loc&#x212Ahart</Name>

against the schema and the validator says "Valid"

Ouch!

Invalid data has gotten into our system.

Question: Are there other characters similar to the Kelvin Sign? That is, are there other characters that are outside [A-Za-z]  but when lower-case() or upper-case() is applied to them they are inside [A-Za-z]?

/Roger 


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS