OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Parsing XML with anything but

On Mon, Dec 9, 2013 at 4:20 AM, Ihe Onwuka <ihe.onwuka@gmail.com> wrote:
I have been tagsouping (thank you thank you John Cowan).

Beautiful Soup, which I first learnt of his on his website seems to be quite widely popular. In fact that rolling your own XML parser seems de-rigeur for each language community, even amongst communities that ought to know better (what terror could XSLT pose for a Haskell dev?)

I'd rather ask: why on earth would anyone suffer through XSLT if they had Haskell? HaXml is actually a very respectable parser and toolkit.

Also I think you're mixing up two issues. TagSoup and BeautifulSoup were both designed for parsing what passes for HTML on the Web, and not for XML. With all respect to Mike Kay, his tools won't directly help you there. Sure people do use them for processing XML sometimes, and not just XHTML. Well, people also use regexen for doing so. I personally use grep on XML almost as much as I do the compliant parsers that I myself have worked on.

Speaking of those compliant XML parsers, most of these language communities do have them, and many have multiple. Python, the community behind Beautiful Soup, has several, and I should know, having worked on most of them at one time or another. James Clark's expat forms the backbone of most of them, through the Python/C conduit. There are others as well using other core parsers. Surely you'll not quibble with that, as long as the tools are compliant.

So I think the question that's really worth asking is why a developer working in language X would use non-compliant tool Z to process XML rather than compliant tool Y. I would not expect a Python developer to use Xerces or Saxon, and yet why would he use Beautiful Soup rather than the many XML compliant alternatives?

Well that comes back to what I always come back to on this list. XML is to complicated, and that complication necessarily manifests itself in fully compliant tools.  Developers have come to hate XML and just want to crowbar, chisel and scrape it as quickly as possible into a structure that they can actually understand, using a tool that seems to hate XML as much as they do. We can complain all we want about the poor professional state of developers as we understand it, but it's the raw reality, and we can either make it easier for them to do the right thing, or keep on tsk tsking at them from here.

MicroXML is an effort to make XML simple enough to tempt developers to do the right thing.

Uche Ogbuji                                       http://uche.ogbuji.net
Founding Partner, Zepheira                  http://zepheira.com
Author, Ndewo, Colorado                     http://uche.ogbuji.net/ndewo/
Founding editor, Kin Poetry Journal      http://wearekin.org
Editor & Contributor, TNB     http://www.thenervousbreakdown.com/author/uogbuji/
http://copia.ogbuji.net    http://www.linkedin.com/in/ucheogbuji    http://twitter.com/uogbuji

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS