XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
An element's value is an invalid Unicode string ... how can it bewell-formed?

Hi Folks,

Consider this Spanish name: Martiņez

Instead of using the ņ character, one can use the (base) "n" character followed by a combining tilde (hex 303) character.

So that Spanish name can be equivalently expressed as: Martiñez

Here is an XML document that uses the latter form:

<?xml version="1.0" encoding="utf-8"?>
<Name>Martin&#x303;ez</Name>

I wrote a stylesheet that uses the substring() function to extract the combining tilde character and onward:

    <xsl:template match="/">
            <Result>
                    <xsl:value-of select="substring(Name, 7)" /> 
            </Result>     
    </xsl:template>

The output is:

<?xml version="1.0" encoding="UTF-8"?>
<Result>Þez</Result>

I checked it for well-formedness and the XML Parser says it is well-formed.

According to the book, Fonts & Encodings (p. 61, first paragraph):

    ... we select a substring that begins
    with a combining character, this new
    string will not be a valid string in
     Unicode.

The value of the <Result> element is not a valid Unicode string, so how can it be a well-formed XML document?

/Roger


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS