OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   regular expression question

[ Lists Home | Date Index | Thread Index ]
  • To: <xml-dev@lists.xml.org>
  • Subject: regular expression question
  • From: "Paul Hermans" <paul.hermans@amplexor.com>
  • Date: Fri, 26 Aug 2005 11:16:11 +0200
  • Thread-index: AcWqHs0HSRAin46QQJS90gGDOhGTeQ==
  • Thread-topic: regular expression question

<xsd:simpleType name="emailType">
		<xsd:restriction base="xsd:string">
			<xsd:pattern
value="[\p{L}_-]+(\.[\p{L}_-]+)*@[\p{L}_]+(\.[\p{L}_]+)+"/>
		</xsd:restriction>
</xsd:simpleType>


Following tools do not throw an error: XML Spy, Stylus Studio, Oxygen.
On the other hand: Saxon8SA and IPSI-XQ do.

If the definition is changed to
<xsd:simpleType name="emailType">
		<xsd:restriction base="xsd:string">
			<xsd:pattern
value="[\p{L}_\-]+(\.[\p{L}_\-]+)*@[\p{L}_]+(\.[\p{L}_]+)+"/>
		</xsd:restriction>
</xsd:simpleType> 

Saxon8SA and IPSI-XQ do not complain anymore.

I think the rationale is that the hyphen "-" has within the square
brackets (to define character classes) a special meaning and needs to be
escaped.

But to my surprise the same regular expression is accepted by a
dedicated regular expression engine (RegExBuddy), who clearly indicates
that it is the character itself we are after.

The rationale here could be that since no other character is following
the hyphen is not used for indication ranges in character classes, but
as itself.

Which interpretation is the correct one?


Thanks,


Paul




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS