Re: XML Schema 1.1 xpath 2.0 regex question

Hi all,

I've another question on the same topic, as follows.

I've following XML instance document,

<?xml version="1.0"?>
<X>
<a>hello world</a>
</X>

And the following XML Schema 1.1 document,

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="X">
<xs:complexType>
<xs:sequence>
<xs:element name="a" type="xs:string"/>
</xs:sequence>
<xs:assert test="matches(a, 'hello[ ]+world')"/>
<xs:assert test="matches(a, 'hello\x{0020}+world')"/>
</xs:complexType>
</xs:element>

</xs:schema>

(the XSD validation requirement is, XML instance string value of element "a" must be word 'hello' followed by one or more space characters and then the word 'world')

The intent of both xs:assert's is same (it's just that, the second xs:assert refers the space character by a unicode code point hex notation as per java's regex convention. the first xs:assert specifies the space character as a literal).

Apache Xerces, doesn't have problems with both the xs:asserts and reports the XML instance document as valid. Where as, Saxon says that second xs:assert has a regex syntax error (it says, "Syntax error at char 7 in regular expression: Escape character 'x' not allowed").

With respect to the XSD validation example provided above, any thoughts, with respect to XML validation correctness, and what the relevant specs say about compliance?

Is it also fine, that Xerces can say as implementation defined feature, "we support specifying characters within XSD 1.1 regex expressions with unicode code point hex notation (\x{...}) ?

I'm also curious to know, does Saxon supports specifying characters within XSD 1.1 regex expressions with unicode code point notation?

Regards,

Mukul Gandhi