XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] sets of parsing rules

Belated reply:

Thanks Michael and Philippe for the pointers.  I will see how far I can
get with tag soup and pick up from there with the many good links
Philippe sent.

----------->Nathan



.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:||:._.:
||:.

Nathan Young
Cisco.com->Interface Development
A: ncy1717
E: natyoung@cisco.com  

> -----Original Message-----
> From: Michael Kay [mailto:mike@saxonica.com] 
> Sent: Thursday, February 08, 2007 2:06 AM
> To: Nathan Young -X (natyoung - Artizen at Cisco); 'XML 
> Developers List'
> Subject: RE: [xml-dev] sets of parsing rules
> 
> > I have an application that parses a large number of HTML 
> > pages.  A few of them are well formed XHTML but that's the 
> > exception rather than the rule.  By grabbing pages, 
> > manipulating them a bit (regexps have been sufficient here so 
> > far), then tidying them I can get them to a state where they 
> > are parsable XML.  From there I can use XSL to get them the 
> > rest of the way (although I have a process that allows me to 
> > run regexps here too, supplementing XSLT 1.0).
> 
> I'm not sure why you are doing this yourself, when the job 
> has already been
> done. Pick up John Cowan's TagSoup parser, and just plug it 
> in as the parser
> front-end to Saxon, and you will be able to run your 
> stylesheets on the HTML
> directly.
> 
> Michael Kay
> http://www.saxonica.com/
> 


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS