Lists Home |
Date Index |
On Wed, Mar 26, 2003 at 10:13:40PM -0700, Uche Ogbuji wrote:
> BTW, I find the idea of processing XML using simple regexen pretty hair
Luckily that's not what Tim was talking about at all. I'm guessing you
have not yet read about Perl 6 regular expressions :-)
Perl 6 "regular expressions" are actually full-blown grammars, with an
new and massivley clearer syntax. And that's what he referred to.
It's more like having a more flexible and more powerful YACC interpreter.
None the less, it's worth noting that one of the use cases for XML from
the beginning was the "desparate perl hacker" who had to change, say,
part number 1976 to 3072 in 100,000 documents without affecting dates,
and had an afternoon to do it. That specific use case was achieved in
practice for most people.
If it sounds like a small achievement, consider
(1) you can't easily do this given 100,000 mixed word processing documents
(and with that many they will almost certainly not all be from the
same version of the same program)
(2) It wasn't trivial to do in SGML unless you had used an SGML editor or
somehow "normalised" your data -- some examples:
<pn>1976</p> <p>....<pn>1976</pn></p> (The OMITTAG feature)
<pn type="1976"> <pn type="1976"/> or <pn type="1976"></pn>
Here, a more flexible input syntax makes life harder. Sometimes it
makes life easier, too, especially for document authors.
People have (separately) asked for all of these features (except
possible DATATAG) in XML. But introducing any of them makes life
harder for the Desparate Perl Hacker.
So now we have a data representation that's a compromise -- it's not
quite so nice, perhaps, for authors, but it's much nicer for programmers.
The result of that has been massivley more application uptake, though,
and many more authors who can now have vague feelings of resentment
at the not-so-niceness :-) and that's the nature of compromise.
Tim is right, I think (as so often) in that programming languages don't
usually have a model that fits well with XML. Another way to look at
it -- and he hinted at this in his 'blog -- is that XML is encouraging
lots of programmers to manipulate the sorts of data that previously
they would have ignored or left to someone else. It's part of the
implication of massively increased uptake of SGML, or structured text:
more people are dealing with structured text. Since the previous
generations of programming languages didn't have this a data type,
I think this is part of why XSLT has been so successful. I'll be very
interested to see if we get takeup of XML Query in the same way, which
is a programming language with a non-XML syntax sort of like a cross
between PL/1, SQL and Java in my mind, but with XML, XPath and strong
It's also interesting watching some newer programming languages be
developed with more structured (in the XML sense) data types, and
corresponding operations. Some of the functional languages are
well-placed to exploit this, although we're also seeing people
jumping on the bandwagon by adding pointy brackets to existing
languages and calling them "XML-based". Time will no doubt help us
sort out the truly bad from the medicore (evolution does not favour
the truly excellent) and we will march on.
Liam Quin, W3C XML Activity Lead, firstname.lastname@example.org, http://www.w3.org/People/Quin/
Ankh's list of IRC clients: http://www.valinor.sorcery.net/clients/