[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
XML versus Unicode ... here are the facts about their differences
- From: "Costello, Roger L." <costello@mitre.org>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Thu, 31 Jan 2013 20:59:01 +0000
Hi Folks,
Below I have listed the differences between XML and Unicode as explicitly and completely as I can. Please let me know where I err.
Assume: This XML tag uses the precomposed ņ character:
<Martiņez>
Assume: This XML tag uses 'n' plus the "combining tilde" character:
<Martiņez>
Fact: The two tags are visually IDENTICAL. More precisely, the glyphs on display screens are IDENTICAL.
Fact: Below are two representations of the SAME CHARACTER:
a. precomposed ņ character
b. 'n' plus the "combining tilde" character
Fact: According to the Unicode standard, the two representations ARE EQUIVALENT.
Fact: According to the XML standard, the two representations ARE NOT EQUIVALENT.
Fact: According to the Unicode standard, applications must treat the two representations exactly the SAME. Applications must compare the two representations as EQUAL.
Fact: According to the XML standard, applications must treat the two representations as DIFFERENT. XML applications must compare the two representations as NOT EQUAL.
Fact: In XML two Unicode-identical CHARACTERS may be considered to be DIFFERENT.
Fact: XML parsing is done on codepoints, not characters nor on the bytes that are used inside the computer to represent the codepoints.
Fact: XML parsing is done on codepoints, but XPath does NOT do its string matching operations based on codepoints. XPath uses a byte-for-byte comparison.
What "Facts" are not correct?
/Roger
[1] The precomposed ņ character and the 'n' plus the "combining tilde" character are equivalent: see the book, "Unicode Demystified" page 119.
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]