OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Parsing XML with anything but

On 12/9/13 9:39 PM, Amelia A Lewis wrote:
I should probably avoid this argument. *sigh*
Yes, you should. The shape of the world does actually differ for many cases.

On Mon, 09 Dec 2013 17:08:14 -0500, Simon St.Laurent wrote:
Yes, it's true that writing applications that apply regular
expressions or other text processing to "complete" XML can be
dangerous.  That doesn't mean that people doing that are stupid or
poorly trained, however, and neither does it mean that they haven't
tried their local XML toolsets first and found them wanting.
Simon, I'm afraid that I have to differ with you. Anyone who uses
regular expressions for a grammar that relies extensively on parity is
either stupid or poorly trained. Sure, you can do text processing (==
processing of element names, attribute names, attribute values, and
text node contents (without distinguishing reliably between them))
using regular expressions. You can't reliably establish XML structure,
because the syntax of XML is specified by a grammar that cannot be
handled by a finite automaton, that is not a regular grammar.
No, you can't do any of those things. However, there is an enormous class of tasks for which those issues simply do not matter.

They arise for two reasons:

1) People are performing tasks that are simple enough that even those drawbacks will not get them in trouble.

2) People are applying these tools to subsets of XML for which these issues are unlikely to apply.

Yes, in general, XML is capable of infinite headache-inducement for those foolish enough to approach it with regexes or pretty much any tools that were not written as XML parsers. My time spent writing a markup parser taught me many of them.

However, the subset of cases is common enough that condescension is foolish.

Using regular expressions to handle XML (except in specialized
circumstances, possibly including "s/Soviet Union/Russian
Federation/g", but almost certainly not including "s/soviet/russian/gi"
because the latter (apart from demonstrating a lamentable historical
illiteracy (speaking as a formally-trained historian of the Soviet
Union, once upon a time)) is too apt to change attribute or element
names) is, to follow the pattern of analogy common in recent threads,
about the equivalent of handing a carpenter framing lumber and screws
and watching him whip out his ... hammer. A carpenter who does so
(except in specialized circumstances) is aptly regarded as stupid or
poorly trained (generically: not competent to handle the problem). More
to the point, the structure such a carpenter creates is going to
*fail*, which means it is appropriate for other carpenters to say "that
ain't right."
Yes. But vast quantities of processing work in contexts where there are no screws involved, just nails. That is even true for... gasp... markup. (And yeah, natural language processing is hard. That's not a surprise.)

I get that people on an XML list freak out when people don't follow all the rules we think we've established. We need to find a better way to handle our freaking out than sputtering about "either stupid or poorly trained" people who "don’t have the inclination, patience or capability to fully understand your language of choice."

It makes us look bad, not them. It hurts our cause(s), and doesn't help theirs.

That attitude is exactly why I've largely given up speaking about XML to broader audiences and retreated to "markup". It doesn't carry the elitist baggage or visions of infinite complexity.

Simon St.Laurent

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS