[
Lists Home |
Date Index |
Thread Index
]
David Tolpin wrote:
>Some schema languages use string regular expressions to check lexical space of
>attributes and character data. The regex strings often become uncomprehensible,
>such as
>
>(([a-zA-Z][0-9a-zA-Z+\-\.]*:)?/{0,2}[0-9a-zA-Z;/?:@&=+$\.\-_!~*'()%]+)?(#[0-9a-zA-Z;/?:@&=+$\.\-_!~*'()%]+)?
>
>for any URI.
>
>Providing a structured syntax, similar to that for XML, would help reading and debugging
>them, for example,
>
> s-pattern="""
> comment = "\(([^\(\)\\]|\\.)*\)"
> atom = "[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+"
> atoms = atom "(\." atom ")*"
> person = "\"([^\"\\]|\\.)*\""
> location = "\[([^\[\]\\]|\\.)*\]"
> local-part = "(" atoms "|" person ")"
> domain = "(" atoms "|" location ")"
> start = "(" comment " )?" local-part "@" domain "( " comment ")?"
> """
>
>instead of
>
> pattern=
> "(\(([^\(\)\\]|\\.)*\) )?"
> ~ """([a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+(\.[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+)*|"([^"\\]|\\.)*")"""
> ~ "@"
> ~ "([a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+(\.[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+)*|\[([^\[\]\\]|\\.)*\])"
> ~ "( \(([^\(\)\\]|\\.)*\))?"
>
>Why isn't it done?
>
>
I always do that for our Java regular expresions:
String s-pattern="\"";
String comment = "\(([^\\(\)\\]|\\.)*\)";
String atom = "[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+";
String atoms = atom + "(\." + atom + ")*";
String person = "\"([^\"\\]|\\.)*\"";
String location = "\[([^\[\]\\]|\\.)*\]";
String local-part = "(" + atoms + "|" + person + ")";
String domain = "(" + atoms + "|" + location + ")";
String start = "(" + comment + " )?" + local-part + "@" + domain + "( " + comment + ")?";
or whatever. Crazy not to IMHO.
Cheers
Rick Jelliffe
|