[
Lists Home |
Date Index |
Thread Index
]
Sean McGrath wrote:
> [Bill de hÓra]
> >And I don't understand this disdain for regular expressions over XML.
> >Regexes are a perfectly useful tool for manipulating text.
>
> Hi Bill,
>
> I used regexp's myself - I'd say about 30% of the time when processing
> XML. It makes me nervous
> though and I try not to do it in any mission critical context.
>
> The trouble comes in having a degree of confidence in the correctness of
> the regexps.
I think we're agreeing, but I'm looking at it backways- you'd want
to know what you're looking for is regular and not say, context
free, rather than hope the regex doesn't consume on false positives
and negatives. If you know it's not regular or just don't care to
know, that's willful engagement in incompetence.
> The standard answer I get when I harp on about this is something
> like "ah, but I know the XML I'm processing is machine generated and
> consistent therefore...".
For regexing XML though I'm really talking about little
admin/console jobs and sed scripts over the likes of config files
rather than something sitting in front of a data stream (where Son
of Regex, XPath, can do nicely).
One tempting exception might be for templating languages with what
you might call 'magic tags' that get expanded. So instead of using
$Revision
you end up with:
<magic:Revision/>
This works so long as you produce and consume, with nothing in the
middle (often the case with templating). Once another system is
inserted and does this:
<magic:Revision>
</magic:Revision>
you're stuffed, or quickly refactoring to
<magic:Revision value="$Revision"/>.
But if you hate attributes that's ok, there is an industrial
strength, time-honoured option. If in the grand tradition we simply
add,
<!-- =============================================
(DPH 2003-04-01) This Can't Happen:
<magic:Revision>
</magic:Revision>
============================================= -->
<magic:Revision/>
we're all set... ;)
Bill
|