[
Lists Home |
Date Index |
Thread Index
]
David Tolpin wrote:
>>> s-pattern="""
>>> comment = "\(([^\(\)\\]|\\.)*\)"
>>> atom = "[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+"
>>> atoms = atom "(\." atom ")*"
>>> [...]
>>>
>>>Why isn't it done?
>>
>>
>>HyLex used a similar syntax for regular expressions.
>>I've always wondered why the idea never caught on elsewhere.
>>(Then again, none of the ideas from HyTime ever really
>>caught on...)
>
>
> In fact, I've implemented it in an extension datatype library for my
Relax
> NG validator; it is only 70 lines of code in Scheme, after all. Proved
> to be very useful for debugging.
Very clever. But a naive implementation would just recursively
concatenate the strings to make a single regex strings. Could you
elaborate on the debugging advantage, i.e., how it makes it easier for a
schema writer to debug regular expressions?
Jeni Tennison used the same idea with a slightly different syntax in her
DTLL proposal (I've lost the URL). Her idea had the added twist that an
application could receive the results of the regular expression parse as
a structured result, e.g., through a SAX API. Thus, using your example,
the string "(David Tolpen)David.Tolpin@nospam.net" might produce the
'infoset':
<start>
<comment>(David Tolpen)</comment>
<local-part>
<atoms>
<atom>David</atom>.<atom>Tolpin</atom>
</atoms>
</local-part>@<domain>
<atoms>
<atom>nospam</atom>.<atom>net</atom>
</atoms>
</domain>
</start>
This still seems a fruitful avenue to explore.
Bob Foster
http://xmlbuddy.com/
|