Lists Home |
Date Index |
7/14/2002 10:00:06 PM, Robert Mena <firstname.lastname@example.org> wrote:
>Hi, I am developing an application that will have to
>build a DOM tree of html pages.
>I'll use such DOM trees to perform some
>Since most of the time I'll find ill-formed documents
>I'd like to know if there are any parsers out there
>that "accept" this flaws and builds the tree anyway.
>I've tried domxml (php) with no luck.
The standard answer is to use tidy to convert to XHTML.
http://tidy.sourceforge.net/ and then parse it with an
ordinary XML parser.
at all possible. For better or worse (mostly worse!) the browser
vendors have worked hard to "accept the flaws and build a
tree anyway", and then expose that tree with the DOM API. You
can essentially pretend the ill-formed HTML is XML and use the
XML Core DOM to work with it. You might need to use some
server-side PHP or whatever to grab web pages, filter in the