OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Regular expression for URI matching



Alexander Falk of Altova, the XML Spy people, posted the following to
an internal W3C mailing list.  With his permission, I am reposting it
here so that it will be archived.  Anyone may use it, but this
information is provided "as-is" with no warranties whatsoever regarding
the correctness of the information.

----- Forwarded message from Alexander Falk -----

This is the Regular Expression (RE) we originally used for the anyURI
dataype within our XML Spy product up until 4.0b2:

	
(([a-zA-Z][0-9a-zA-Z+\\-\\.]*:)?/{0,2}[0-9a-zA-Z;/?:@&=+$\\.\\-_!~*'()%]+)?(
#[0-9a-zA-Z;/?:@&=+$\\.\\-_!~*'()%]+)?

It was constructed according to the BNF grammar given in RFC 2396
(http://www.ietf.org/rfc/rfc2396.txt) and we used this RE to validate
elements and attributes whose datatype was anyURI.

However, we've found that (a) many customers actually use illegal URIs in
their documents happily, (b) XML Schema Part 2
(http://www.w3.org/TR/xmlschema-2/#anyURI) doesn't require any validation of
the contents of the anyURI dataype, and (c) most customers don't want us to
validate stronger than what other processors are doing.

Therefore, we are currently eliminating the anyURI checking [...]

-- 
John Cowan           http://www.ccil.org/~cowan              cowan@ccil.org
Please leave your values        |       Check your assumptions.  In fact,
   at the front desk.           |          check your assumptions at the door.
     --sign in Paris hotel      |            --Miles Vorkosigan