[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Your XML documents may use different sets of characters, dependingon which implementer you select?
- From: "Costello, Roger L." <costello@mitre.org>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Tue, 17 May 2011 09:10:50 -0400
Hi Folks,
The XML specification lists the set of characters that may be used in XML documents.
The characters are Unicode characters.
Unicode has something called categories. A category is a set of characters.
Here is a category: Nd
The Nd category consists of decimal digit characters.
Unicode is an evolving standard. Thus, there are different versions.
The set of decimal digit characters in the Nd category may vary, depending on the version of Unicode.
The XML specification says that XML documents can use the characters in the Nd category.
But, but, but, ...
The characters in the Nd category may vary, depending on the version of Unicode. Do we have an ever-changing base of characters that are permitted in XML documents?
The XML specification mandates version 2.0 of Unicode.
Phew! That removes the variability in set of characters that may be used in XML documents.
But wait!
There are XML applications that build on top of XML. And some of those applications are lax about which version of Unicode must be used. For example, with XML Schema:
As far as conformant processors are concerned, the spec offers
implementers freedom to choose which version of Unicode they will
support. So if the definitions of character groups like Nd change from
one Unicode version to the next, this may be reflected in differences
between schema processors. [1]
So, one XML Schema validator may support Unicode 2.0 and another XML Schema validator may support Unicode 2.1. Suppose that in Unicode 2.0 there are 600 characters in the Nd category and in Unicode 2.1 there are 610 characters in the Nd category. An XML instance document may validate against one validator and fail against another.
Ouch!
Are other XML applications similarly lax, permitting implementers to pick which version of Unicode they will support?
Does the XSLT spec allow implementers freedom to choose which version of Unicode they will support?
Does the UBL spec allow implementers freedom to choose which version of Unicode they will support?
Does the RELAX NG spec allow implementers freedom to choose which version of Unicode they will support?
Does the XBRL spec allow implementers freedom to choose which version of Unicode they will support?
Does the SVG spec allow implementers freedom to choose which version of Unicode they will support?
/Roger
[1] http://lists.w3.org/Archives/Public/xmlschema-dev/2011May/0024.html
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]