OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] generating DOM from ill-formed HTML docs

[ Lists Home | Date Index | Thread Index ]

[Robert Mena]

> Hi, I am developing an application that will have to
> build a DOM tree of html pages.
> I'll use such DOM trees to perform some
> analysis/comparisons.
> Since most of the time I'll find ill-formed documents
> I'd like to know if there are any parsers out there
> that "accept" this flaws and builds the tree anyway.
> I've tried domxml (php) with no luck.

The usual answer is to preprocess with Tidy - see


You may also want to look at NekoHTML, at


This work processed html, including fixing up some problems, and uses the
Xerxes JNI so you can build  a DOM.


Tom P


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS