xml-dev - Re: [xml-dev] Re: If XML is too hard for a programmer, perhaps he'dbe b

Re: [xml-dev] Re: If XML is too hard for a programmer, perhaps he'dbe b

[ Lists Home | Date Index | Thread Index ]

To: Sean McGrath <sean.mcgrath@propylon.com>
Subject: Re: [xml-dev] Re: If XML is too hard for a programmer, perhaps he'dbe better off as a crossing guard
From: Bill de hÓra <bill@dehora.net>
Date: Sat, 29 Mar 2003 18:47:51 +0000
Cc: xml-dev@lists.xml.org
In-reply-to: <5.1.0.14.0.20030328083541.025769a0@mail.propylon.com>
References: <5.1.0.14.0.20030328083541.025769a0@mail.propylon.com>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3) Gecko/20030312

Sean McGrath wrote:
> [Bill de hÓra]
>  >And I don't understand this disdain for regular expressions over XML.
>  >Regexes are a perfectly useful tool for manipulating text.
> 
> Hi Bill,
> 
> I used regexp's myself - I'd say about 30% of the time when processing 
> XML. It makes me nervous
> though and I try not to do it in any mission critical context.
> 
> The trouble comes in having a degree of confidence in the correctness of 
> the regexps.

I think we're agreeing, but I'm looking at it backways- you'd want 
to know what you're looking for is regular and not say, context 
free, rather than hope the regex doesn't consume on false positives 
and negatives. If you know it's not regular or just don't care to 
know, that's willful engagement in incompetence.

> The standard answer I get when I harp on about this is something
> like "ah, but I know the XML I'm processing is machine generated and 
> consistent therefore...".

For regexing XML though I'm really talking about little 
admin/console jobs and sed scripts over the likes of config files 
rather than something sitting in front of a data stream (where Son 
of Regex, XPath, can do nicely).

One tempting exception might be for templating languages with what 
you might call 'magic tags' that get expanded. So instead of using

  $Revision

you end up with:

  <magic:Revision/>

This works so long as you produce and consume, with nothing in the 
middle (often the case with templating). Once another system is 
inserted and does this:

  <magic:Revision>
  </magic:Revision>

you're stuffed, or quickly refactoring to

  <magic:Revision value="$Revision"/>.

But if you hate attributes that's ok, there is an industrial 
strength, time-honoured option. If in the grand tradition we simply 
add,

  <!-- =============================================
         (DPH 2003-04-01) This Can't Happen:
  	<magic:Revision>
         </magic:Revision>
   =============================================  -->
  <magic:Revision/>

we're all set... ;)

Bill

References:
- Re: [xml-dev] Re: If XML is too hard for a programmer, perhaps he'd be better off as a crossing guard
  - From: Sean McGrath <sean.mcgrath@propylon.com>

Prev by Date: Re: [xml-dev] GXA specifications process
Next by Date: Re: [xml-dev] Re: If XML is too hard for a programmer, perhaps he'd be better off as a crossing guard
Previous by thread: Re: [xml-dev] Re: If XML is too hard for a programmer, perhaps he'd be better off as a crossing guard
Next by thread: RE: [xml-dev] XSD: Behaviour of <xsd:unique>
Index(es):
- Date
- Thread