[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
XML schema xs:string and non BMP character like 𐌀, lengthrestriction
- From: Martin Honnen <Martin.Honnen@gmx.de>
- To: xml-dev <xml-dev@lists.xml.org>
- Date: Fri, 12 Oct 2012 12:16:38 +0200
Hi,
I am seeing inconsistencies between different schema validating parsers
when it comes to Unicode characters outside of the BMP, like 𐌀
for instance, and length restrictions on xs:string.
For the sample
http://home.arcor.de/martin.honnen/xml/oneCharInstance1.xml which has
the contents
<?xml version="1.0" encoding="utf-8" ?>
<root>
<test>𐌀</test>
</root>
the XSV validator and Saxon 9.4 EE don't report any validation errors
when validading against the schema
http://home.arcor.de/martin.honnen/xml/oneCharSchema1.xsd (which has as
it contents
<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified"
elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="root">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" name="test" type="one-char" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:simpleType name="one-char">
<xs:restriction base="xs:string">
<xs:length value="1"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
).
However Xerces Java 2.11 reports "[Error] oneCharInstance1.xml:3:25:
cvc-length-valid: Value '?' with length = '2'
is not facet-valid with respect to length '1' for type 'one-char'." so
it seems to consider the contents of the "test" element as a string with
two characters.
MSXML 6 and .NET's validating parser report similar errors.
In my view Xerces and MSXML and .NET get it wrong as in terms of the XML
specification and the schema data type 𐌀 is a single XML
character but I would like confirmation by others on the list before
filing bugs.
--
Martin Honnen --- MVP Data Platform Development
http://msmvps.com/blogs/martin_honnen/
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]