[
Lists Home |
Date Index |
Thread Index
]
[Jeff Lowery]
>This thread sounds more like an argument for full-document validation prior
>to processing, or at the very least making sure you've done document version
>checking. Once that happens, regexing should be fine (assuming the
>programmer understands the schema or specification the version info is based
>on).
Not so. A valid document can be, lexically speaking, from outer space. You
*do* need
to worry if regexp are being used, - even for valid documents.
Example Pulse:
<!DOCTYPE pulse [
<!ELEMENT pulse (#PCDATA)>
<!ENTITY LetsHaveOneOfThese "2">
]>
<pulse >
7<!-- Ode to the lump of green putty, I found in my armpit, one
mid-summer morning.-->
<![CDATA[ 0]]>
&LetsHaveOneOfThese;
</pulse
>
What I'd love to see would be an XML Lint tool - a parser with the ability
to produce messages
on stderr if it sees stuff in the XML that could trip up regexp processing.
Perl types could then knock themselves out with a beautifully crafted
McCarthy conditional
as a guard command to their regexp:-)
regards,
Sean
http://seanmcgrath.blogspot.com
|