On Mon, 09 Dec 2013 17:08:14 -0500, Simon St.Laurent wrote:
Yes, it's true that writing applications that apply regular
expressions or other text processing to "complete" XML can be
dangerous. That doesn't mean that people doing that are stupid or
poorly trained, however, and neither does it mean that they haven't
tried their local XML toolsets first and found them wanting.
Simon, I'm afraid that I have to differ with you. Anyone who uses
regular expressions for a grammar that relies extensively on parity is
either stupid or poorly trained. Sure, you can do text processing (==
processing of element names, attribute names, attribute values, and
text node contents (without distinguishing reliably between them))
using regular expressions. You can't reliably establish XML structure,
because the syntax of XML is specified by a grammar that cannot be
handled by a finite automaton, that is not a regular grammar.
No, you can't do any of those things. However, there is an enormous
class of tasks for which those issues simply do not matter.Using regular expressions to handle XML (except in specialized
circumstances, possibly including "s/Soviet Union/Russian
Federation/g", but almost certainly not including "s/soviet/russian/gi"
because the latter (apart from demonstrating a lamentable historical
illiteracy (speaking as a formally-trained historian of the Soviet
Union, once upon a time)) is too apt to change attribute or element
names) is, to follow the pattern of analogy common in recent threads,
about the equivalent of handing a carpenter framing lumber and screws
and watching him whip out his ... hammer. A carpenter who does so
(except in specialized circumstances) is aptly regarded as stupid or
poorly trained (generically: not competent to handle the problem). More
to the point, the structure such a carpenter creates is going to
*fail*, which means it is appropriate for other carpenters to say "that
ain't right."
Yes. But vast quantities of processing work in contexts where there are
no screws involved, just nails. That is even true for... gasp...
markup. (And yeah, natural language processing is hard. That's not a
surprise.)