[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
RE: [xml-dev] sets of parsing rules
- From: "Nathan Young -X \(natyoung - Artizen at Cisco\)" <natyoung@cisco.com>
- To: "Michael Kay" <mike@saxonica.com>, "XML Developers List" <xml-dev@lists.xml.org>
- Date: Tue, 13 Feb 2007 09:54:50 -0800
Belated reply:
Thanks Michael and Philippe for the pointers. I will see how far I can
get with tag soup and pick up from there with the many good links
Philippe sent.
----------->Nathan
.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:
||:.
Nathan Young
Cisco.com->Interface Development
A: ncy1717
E: natyoung@cisco.com
> -----Original Message-----
> From: Michael Kay [mailto:mike@saxonica.com]
> Sent: Thursday, February 08, 2007 2:06 AM
> To: Nathan Young -X (natyoung - Artizen at Cisco); 'XML
> Developers List'
> Subject: RE: [xml-dev] sets of parsing rules
>
> > I have an application that parses a large number of HTML
> > pages. A few of them are well formed XHTML but that's the
> > exception rather than the rule. By grabbing pages,
> > manipulating them a bit (regexps have been sufficient here so
> > far), then tidying them I can get them to a state where they
> > are parsable XML. From there I can use XSL to get them the
> > rest of the way (although I have a process that allows me to
> > run regexps here too, supplementing XSLT 1.0).
>
> I'm not sure why you are doing this yourself, when the job
> has already been
> done. Pick up John Cowan's TagSoup parser, and just plug it
> in as the parser
> front-end to Saxon, and you will be able to run your
> stylesheets on the HTML
> directly.
>
> Michael Kay
> http://www.saxonica.com/
>
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]