XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] How to convert html file to xml format using Tagsoup

Jack Bush wrote:
> Hi All,
>  
> I would like to find out how to convert html file to xml format using 
> Tagsoup in Java 6.
>  
> I have read through all the documents from http://tagsoup.info and other 
> Google searches but could not find much useful material. Majority of 
> these material are more for overview as opposed to explanation on how to 
> use each classes provided by Tagsoup package.
>  
> Both tagsoup-1.2.jar and tagsoup-1.2-src.zip have been downloaded but 
> only has the instruction to use tagsoup-1.2.jar as a stand- alone 
> command. However, I am looking for some examples on how to do the 
> conversion in Java instead.

XSLT any good? Use tagsoup as the parser of your html, then
treat it as wf XML?

java -cp /myjava/saxon9.jar:/myjava/xercesImpl.jar:/myjava/tagsoup.jar 
   net.sf.saxon.Transform  -x:org.ccil.cowan.tagsoup.Parser -t -o $3 $1 
$2  $4 $5 $6


Bit easier than hacking at the dom.




HTH

-- 
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS