OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Re: If XML is too hard for a programmer, perhaps he'd be b

[ Lists Home | Date Index | Thread Index ]

[Bill de hÓra]
 >And I don't understand this disdain for regular expressions over XML.
 >Regexes are a perfectly useful tool for manipulating text.

Hi Bill,

I used regexp's myself - I'd say about 30% of the time when processing XML. 
It makes me nervous
though and I try not to do it in any mission critical context.

The trouble comes in having a degree of confidence in the correctness of 
the regexps.

For example, on the face of it using a regexp to catch occurences of:
         <name>Sean</name>
is simple. Not so for a many reasons. Writing regexps capable of getting 
this right
in the full generality of XML 1.0 is tantamount to writing a full xml 1.0 
WF parser.

The standard answer I get when I harp on about this is something
like "ah, but I know the XML I'm processing is machine generated and consistent
therefore...".

I always feel uneasy relying on the upstream XML supplier like this! It 
introduces a
degree of brittle coupling in systems that is best avoided if possible.

I can only see two routes to making XML regexping as safe as it is convenient:

1) Make a profile of XML 1.0 *syntax* that is regexp safe (permathread anyone?)

2) Use a post-parse syntax for regexp work like PYX notation

regards,
Sean


http://seanmcgrath.blogspot.com






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS