[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Regular expression for URI matching
- From: Michael Brennan <Michael_Brennan@allegis.com>
- To: xml-dev@lists.xml.org
- Date: Thu, 23 Aug 2001 12:14:47 -0700
Thanks for passing this along (although that regular expression makes my
brain hurt ;)).
It's too bad, though, that Altova is completely removing it. I understand
the reasoning. We've all heard the admonition: be strict in what you create,
be forgiving in what you accept. Unfortunately, the overwhelming majority of
developers follow the path of least resistance. Forgiving web browsers is
one reason there is so much buggy, malformed content on the web. The typical
web developer writes a web page, brings it up in the browser, and if it
displays, they are done. If web browsers were more strict, developers would
produce more conformant content.
Maybe Altova should just add an optional feature that lets a user explicitly
disable the URI checking. That way, at least, they could accomodate their
customers without inadvertently leading naive developers down the path of
bad practice.
> -----Original Message-----
> From: John Cowan [mailto:cowan@mercury.ccil.org]
> Sent: Wednesday, August 22, 2001 7:15 PM
> To: xml-dev@lists.xml.org
> Subject: Regular expression for URI matching
>
>
> Alexander Falk of Altova, the XML Spy people, posted the following to
> an internal W3C mailing list. With his permission, I am reposting it
> here so that it will be archived. Anyone may use it, but this
> information is provided "as-is" with no warranties whatsoever
> regarding
> the correctness of the information.
>
> ----- Forwarded message from Alexander Falk -----
>
> This is the Regular Expression (RE) we originally used for the anyURI
> dataype within our XML Spy product up until 4.0b2:
>
>
> (([a-zA-Z][0-9a-zA-Z+\\-\\.]*:)?/{0,2}[0-9a-zA-Z;/?:@&=+$\\.\\
> -_!~*'()%]+)?(
> #[0-9a-zA-Z;/?:@&=+$\\.\\-_!~*'()%]+)?
>
> It was constructed according to the BNF grammar given in RFC 2396
> (http://www.ietf.org/rfc/rfc2396.txt) and we used this RE to validate
> elements and attributes whose datatype was anyURI.
>
> However, we've found that (a) many customers actually use
> illegal URIs in
> their documents happily, (b) XML Schema Part 2
> (http://www.w3.org/TR/xmlschema-2/#anyURI) doesn't require
> any validation of
> the contents of the anyURI dataype, and (c) most customers
> don't want us to
> validate stronger than what other processors are doing.
>
> Therefore, we are currently eliminating the anyURI checking [...]