[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
An element's value is an invalid Unicode string ... how can it bewell-formed?
- From: "Costello, Roger L." <costello@mitre.org>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Fri, 4 Jan 2013 18:26:47 +0000
Hi Folks,
Consider this Spanish name: Martiņez
Instead of using the ņ character, one can use the (base) "n" character followed by a combining tilde (hex 303) character.
So that Spanish name can be equivalently expressed as: Martiñez
Here is an XML document that uses the latter form:
<?xml version="1.0" encoding="utf-8"?>
<Name>Martiñez</Name>
I wrote a stylesheet that uses the substring() function to extract the combining tilde character and onward:
<xsl:template match="/">
<Result>
<xsl:value-of select="substring(Name, 7)" />
</Result>
</xsl:template>
The output is:
<?xml version="1.0" encoding="UTF-8"?>
<Result>Þez</Result>
I checked it for well-formedness and the XML Parser says it is well-formed.
According to the book, Fonts & Encodings (p. 61, first paragraph):
... we select a substring that begins
with a combining character, this new
string will not be a valid string in
Unicode.
The value of the <Result> element is not a valid Unicode string, so how can it be a well-formed XML document?
/Roger
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]