From: Dave Pawson <davep@dpawson.co.uk>
To: Jack Bush <netbeansfan@yahoo.com.au>
Cc: xml-dev@lists.xml.org
Sent: Saturday, 1 November, 2008 1:06:13 AM
Subject: Re: [xml-dev] How to convert html file to xml format using Tagsoup
Jack Bush wrote:
> Hi All,
> I would like to find out how to convert html file to xml format using Tagsoup in Java 6.
> I have read through all the documents from
http://tagsoup.info and other Google searches but could not find much useful material. Majority of these material are more for overview as opposed to explanation on how to use each classes provided by Tagsoup package.
>
Both tagsoup-1.2.jar and tagsoup-1.2-src.zip have been downloaded but only has the instruction to use tagsoup-1.2.jar as a stand- alone command. However, I am looking for some examples on how to do the conversion in Java instead.
XSLT any good? Use tagsoup as the parser of your html, then
treat it as wf XML?
java -cp /myjava/saxon9.jar:/myjava/xercesImpl.jar:/myjava/tagsoup.jar net.sf.saxon.Transform -x:org.ccil.cowan.tagsoup.Parser -t -o $3 $1 $2 $4 $5 $6
Bit easier than hacking at the dom.
HTH
-- Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk