Re: [xml-dev] Parsing XML with anything but

On Mon, Dec 9, 2013 at 5:54 PM, Ihe Onwuka <ihe.onwuka@gmail.com> wrote:

On Mon, Dec 9, 2013 at 3:16 PM, Uche Ogbuji <uche@ogbuji.net> wrote:

I'd rather ask: why on earth would anyone suffer through XSLT if they had Haskell? HaXml is actually a very respectable parser and toolkit.

Also I think you're mixing up two issues. TagSoup and BeautifulSoup were both designed for parsing what passes for HTML on the Web, and not for XML. With all respect to Mike Kay, his tools won't directly help you there.

But thanks to tagsoup his tools help me indirectly at a price that I consider as good as free. I believe the same mechanism is behind the saxon:parse-html extension.�

Compared against the hoops people are prepared to jump through to make data SQLizable, filtering HTML through Tagsoup� firmly sits in the column labelled trivial. So much so that I regard� HTML is for all intents parseable by X(SLT|Query).

�

Sure people do use them for processing XML sometimes, and not just XHTML. Well, people also use regexen for doing so. I personally use grep on XML almost as much as I do the compliant parsers that I myself have worked on.

My concern with regexp solutions would be robustness, extensibility and readability so I would never do it. On the occasion I used regexp's within an XSLT conversion to upConvert text and then had to amend it 8 months later I spent days eyeballing it to try and get a handle so I could amend it. Why did I have to amend it. Because it was extensible enough to handle variations of the input it was tested with

Because it WASN'T extensible enough etc.........

�