XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Parsing XML with anything but

C'mon people, Amelia said it all.

And said it so well, that I don't have anything to add.

The other reason I don't want to add anything, is that what Amelia is
telling us seems so obvious that I can't believe someone may not
already know this.

Just refresh on Chomsky Hierarchy
(http://en.wikipedia.org/wiki/Chomsky_hierarchy), or, in other words,
the difference between (and expressive power of) Regular Languages and
Context-Free Languages.

Of course, many amazing things have been done with Regexes (the last I
heard of was a Sudoku solver), and this is just a confirmation of the
rule that anything "more" than what RegEx was designed for is just a
miracle.

And miracles, by definition, are not facts. One has to believe in
them, without proof.



On Mon, Dec 9, 2013 at 8:10 PM, Amelia A Lewis <amyzing@talsever.com> wrote:
> Hey, Liam!
>
> On Mon, 09 Dec 2013 22:07:09 -0500, Liam R E Quin wrote:
>> The "desperate perl hacker" was a significant and much-discussed use
>> case during XML development, and was part of why we chose a self-evident
>> empty element syntax.
>
> Mmmmm. I suggest that you didn't succeed. XML, in the general case,
> cannot be reliably handled with regular expressions. This is
> unsurprising; the problem of parity is literally a textbook case for
> the limitations of regular expressions (regular languages, regular
> grammars, finite state automata) in parsing. XML's reliance on parity
> both for tag delimiters (<>) and for start/end semantics (<></>) is
> fairly unquestionable.
>
> Developing a library of regular expressions that handles a series of
> special cases in XML is a good way of falling prey to the classic Perl
> programmer's virtue of hubris. That code may be safe in your own
> (desperate hackish) hands; it isn't safe in someone else's.
>
> One of my earliest experiences of this, around 2001, had to do with a
> processor for handling SOAP (probably 1.1). The designer, a developer
> who is *significantly* smarter, better-trained (in computer sciences in
> general, though not in XML or markup in particular), and more
> experienced than I am (or was), decided that a namespace declaration
> binding the default prefix *necessarily* changed the prefix of
> attributes-without-prefix. Gentle (and less-gentle) remonstration based
> on specifications failed to change his mind. Since SAX wasn't doing the
> right thing, he implemented code that caught the events, changed the
> prefixes appropriately, and passed it on. And on output, it
> did-the-right-thing for generating attributes. Since this blew up in
> ways that those reading this list can probably easily imagine, the XML
> geeks were required to make it work for all those situations.
>
> Even deprecating this enormous pile of pigs' lips as our first activity
> did not save us from the succeeding *two infinite years* of writing
> increasingly baroque and fragile code to catch the output from this ...
> desperate hack ... and turn it into something that was both well-formed
> and valid. It had shipped as production code. Our later ships of the
> production code could *not* say "we fucked up; we can't handle this
> horse pucky," whatever our competitors did with it. We were finally
> able to drop support for the versions of shipping products that used
> this nightmare, and instead rely on well-vetted parsing code (like, the
> original SAX before it got filtered) that Did the Right Thing, and to
> throw out something over 20K lines of specialized "fix the problem that
> we generated by failing to actually train up on the real problem rather
> than our desperate-hackish conception of what it ought to be" code.
>
> I haven't any patience for it. XML 1.0 namespace are a disaster, XML
> schema a living nightmare. Trying to cope with incoming XML that
> *could* contain these things *without understanding those
> specifications*, even if the plan is that the incoming stuff *won't*
> contain them, is asking for problems. Because then you find you have to
> cope with them. And you can't throw out all that beautiful work you've
> done! And when you've moved on, and someone else is trying to deal with
> the new inputs for the code that you wrote that worked so well ...
> perhaps that's brilliant, rather than stupid, but it's not something
> that's going to make your successor bless your name. Or the name of XML.
>
> And that's a problem of training. Like the developer/designer/architect
> who simply *could not believe* that the specification required that
> elements and attributes respond differently to the declaration of a
> binding to the default prefix: insufficient willingness to believe that
> the specification writers could specify something boneheaded. Like the
> DPH-s who wrote piles of regexes because the spec writers said "we're
> making it work for you!" without looking at the specification and
> discovering it's type-1 in the Chomsky hierarchy, not type-0.
>
>> Use of regular expressions does not need to be evidence of stupidity,
>> nor of poor training.
>
> In general? Absolutely not. In dealing with a grammar that is
> context-free, but not regular? It's a sign of poor training at least.
>
> If the expressions operate over something that's known to safely
> conform to a regular grammar (necessarily a special case in XML
> processing), then it's fine. Alas, anyone who succeeds with this is
> going to keep going with it until the [^>] bites. That's an absolute
> certainty if the code is used by more than one person, especially if
> it's hand-me-down.
>
>> I admit to using regular expressions to process
>> XML at times myself, although I also suppose that since I haven't
>> received a whole lot of introductory XML training I'm poorly trained in
>> XML...
>
> I'm probably supposed to be intimidated, considering history and
> authorship and such.
>
> Sorry.
>
> I think that if you turn over your aggregation of regexes to someone
> else, then Bad Things Will Happen. I think that if you don't expect
> that, then perhaps it's an indication of poor training or experience.
> Naivete? Something. Perhaps you'd be one of the ones offering strong,
> understandable, and written (so that they can be passed on) warnings on
> the limitations of the bits that you're turning over to others, and
> none of this applies.
>
>> Absence of carefulness is a problem, but that can be a problem with any
>> tool.
>
> Hammers and screws are an inappropriate combination, as a general rule.
> It has nothing to do with how careful one is, pounding the damned
> things in.
>
> However, let me provide another anecdote, on why this particular
> analogy occurred to me. When I was young (and ... still not pretty,
> alas), I was heavily involved in theater. Community theater, college
> theater. Since I was notably *terrible* on stage, I ended up as part of
> the supporting staff. We did things like building the sets. Our
> director (who, in this environment, is probably better described as
> BossAndGod), handed out lumber, fabric, screws, and ... yes, hammers.
> To build the scrims for the backdrops. On purpose. Because they could
> be hammered in, quickly, and later, when we tore it all down, a
> screwdriver generally got the things back out. We weren't *allowed* to
> use screwdrivers (no power tools in that era, mind; circumstances have
> certainly changed since then) because it *took too long*. We were
> always short on time when a show was coming up.
>
> In other words, this was sensible behavior, for the circumstances. Not
> that we could convince anyone involved with carpentry of it, mind. We
> generally ended up with at least one person each year who had been a
> carpenter's assistant, or who did carpentry of some sort for fun, who
> *insisted* that we could be just as fast doing it the right way. They
> may have even been right. Our way worked, though, and we knew how to
> use our regexen^Whammers.
>
> Mind you, when I tried to build my loft in my first dorm room at
> college, I decided that perhaps I'd been misled. YMMV.
>
> Amy!
> --
> Amelia A. Lewis                    amyzing {at} talsever.com
> About the use of language: it is impossible to sharpen a pencil with a
> blunt axe. It is equally vain to try to do it with ten blunt axes
> instead.
>                 -- Edsger Dijkstra
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php



-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
To achieve the impossible dream, try going to sleep.
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS