OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Parsing XML with anything but

C'mon people, Amelia said it all.

And said it so well, that I don't have anything to add.

The other reason I don't want to add anything, is that what Amelia is
telling us seems so obvious that I can't believe someone may not
already know this.

Just refresh on Chomsky Hierarchy
(http://en.wikipedia.org/wiki/Chomsky_hierarchy), or, in other words,
the difference between (and expressive power of) Regular Languages and
Context-Free Languages.

Of course, many amazing things have been done with Regexes (the last I
heard of was a Sudoku solver), and this is just a confirmation of the
rule that anything "more" than what RegEx was designed for is just a

And miracles, by definition, are not facts. One has to believe in
them, without proof.

On Mon, Dec 9, 2013 at 8:10 PM, Amelia A Lewis <amyzing@talsever.com> wrote:
> Hey, Liam!
> On Mon, 09 Dec 2013 22:07:09 -0500, Liam R E Quin wrote:
>> The "desperate perl hacker" was a significant and much-discussed use
>> case during XML development, and was part of why we chose a self-evident
>> empty element syntax.
> Mmmmm. I suggest that you didn't succeed. XML, in the general case,
> cannot be reliably handled with regular expressions. This is
> unsurprising; the problem of parity is literally a textbook case for
> the limitations of regular expressions (regular languages, regular
> grammars, finite state automata) in parsing. XML's reliance on parity
> both for tag delimiters (<>) and for start/end semantics (<></>) is
> fairly unquestionable.
> Developing a library of regular expressions that handles a series of
> special cases in XML is a good way of falling prey to the classic Perl
> programmer's virtue of hubris. That code may be safe in your own
> (desperate hackish) hands; it isn't safe in someone else's.
> One of my earliest experiences of this, around 2001, had to do with a
> processor for handling SOAP (probably 1.1). The designer, a developer
> who is *significantly* smarter, better-trained (in computer sciences in
> general, though not in XML or markup in particular), and more
> experienced than I am (or was), decided that a namespace declaration
> binding the default prefix *necessarily* changed the prefix of
> attributes-without-prefix. Gentle (and less-gentle) remonstration based
> on specifications failed to change his mind. Since SAX wasn't doing the
> right thing, he implemented code that caught the events, changed the
> prefixes appropriately, and passed it on. And on output, it
> did-the-right-thing for generating attributes. Since this blew up in
> ways that those reading this list can probably easily imagine, the XML
> geeks were required to make it work for all those situations.
> Even deprecating this enormous pile of pigs' lips as our first activity
> did not save us from the succeeding *two infinite years* of writing
> increasingly baroque and fragile code to catch the output from this ...
> desperate hack ... and turn it into something that was both well-formed
> and valid. It had shipped as production code. Our later ships of the
> production code could *not* say "we fucked up; we can't handle this
> horse pucky," whatever our competitors did with it. We were finally
> able to drop support for the versions of shipping products that used
> this nightmare, and instead rely on well-vetted parsing code (like, the
> original SAX before it got filtered) that Did the Right Thing, and to
> throw out something over 20K lines of specialized "fix the problem that
> we generated by failing to actually train up on the real problem rather
> than our desperate-hackish conception of what it ought to be" code.
> I haven't any patience for it. XML 1.0 namespace are a disaster, XML
> schema a living nightmare. Trying to cope with incoming XML that
> *could* contain these things *without understanding those
> specifications*, even if the plan is that the incoming stuff *won't*
> contain them, is asking for problems. Because then you find you have to
> cope with them. And you can't throw out all that beautiful work you've
> done! And when you've moved on, and someone else is trying to deal with
> the new inputs for the code that you wrote that worked so well ...
> perhaps that's brilliant, rather than stupid, but it's not something
> that's going to make your successor bless your name. Or the name of XML.
> And that's a problem of training. Like the developer/designer/architect
> who simply *could not believe* that the specification required that
> elements and attributes respond differently to the declaration of a
> binding to the default prefix: insufficient willingness to believe that
> the specification writers could specify something boneheaded. Like the
> DPH-s who wrote piles of regexes because the spec writers said "we're
> making it work for you!" without looking at the specification and
> discovering it's type-1 in the Chomsky hierarchy, not type-0.
>> Use of regular expressions does not need to be evidence of stupidity,
>> nor of poor training.
> In general? Absolutely not. In dealing with a grammar that is
> context-free, but not regular? It's a sign of poor training at least.
> If the expressions operate over something that's known to safely
> conform to a regular grammar (necessarily a special case in XML
> processing), then it's fine. Alas, anyone who succeeds with this is
> going to keep going with it until the [^>] bites. That's an absolute
> certainty if the code is used by more than one person, especially if
> it's hand-me-down.
>> I admit to using regular expressions to process
>> XML at times myself, although I also suppose that since I haven't
>> received a whole lot of introductory XML training I'm poorly trained in
>> XML...
> I'm probably supposed to be intimidated, considering history and
> authorship and such.
> Sorry.
> I think that if you turn over your aggregation of regexes to someone
> else, then Bad Things Will Happen. I think that if you don't expect
> that, then perhaps it's an indication of poor training or experience.
> Naivete? Something. Perhaps you'd be one of the ones offering strong,
> understandable, and written (so that they can be passed on) warnings on
> the limitations of the bits that you're turning over to others, and
> none of this applies.
>> Absence of carefulness is a problem, but that can be a problem with any
>> tool.
> Hammers and screws are an inappropriate combination, as a general rule.
> It has nothing to do with how careful one is, pounding the damned
> things in.
> However, let me provide another anecdote, on why this particular
> analogy occurred to me. When I was young (and ... still not pretty,
> alas), I was heavily involved in theater. Community theater, college
> theater. Since I was notably *terrible* on stage, I ended up as part of
> the supporting staff. We did things like building the sets. Our
> director (who, in this environment, is probably better described as
> BossAndGod), handed out lumber, fabric, screws, and ... yes, hammers.
> To build the scrims for the backdrops. On purpose. Because they could
> be hammered in, quickly, and later, when we tore it all down, a
> screwdriver generally got the things back out. We weren't *allowed* to
> use screwdrivers (no power tools in that era, mind; circumstances have
> certainly changed since then) because it *took too long*. We were
> always short on time when a show was coming up.
> In other words, this was sensible behavior, for the circumstances. Not
> that we could convince anyone involved with carpentry of it, mind. We
> generally ended up with at least one person each year who had been a
> carpenter's assistant, or who did carpentry of some sort for fun, who
> *insisted* that we could be just as fast doing it the right way. They
> may have even been right. Our way worked, though, and we knew how to
> use our regexen^Whammers.
> Mind you, when I tried to build my loft in my first dorm room at
> college, I decided that perhaps I'd been misled. YMMV.
> Amy!
> --
> Amelia A. Lewis                    amyzing {at} talsever.com
> About the use of language: it is impossible to sharpen a pencil with a
> blunt axe. It is equally vain to try to do it with ten blunt axes
> instead.
>                 -- Edsger Dijkstra
> _______________________________________________________________________
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

Dimitre Novatchev
Truly great madness cannot be achieved without significant intelligence.
To invent, you need a good imagination and a pile of junk
Never fight an inanimate object
To avoid situations in which you might make mistakes may be the
biggest mistake of all
Quality means doing it right when no one is looking.
You've achieved success in your field when you don't know whether what
you're doing is work or play
To achieve the impossible dream, try going to sleep.
Facts do not cease to exist because they are ignored.
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
I finally figured out the only reason to be alive is to enjoy it.

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS