XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] How to convert html file to xml format using Tagsoup

Hi Dave,
 
Thanks for your respond to this question.
 
I prefer not to get into XSLT just for some simple parsing need. Just use XPath in JDOM should be sufficient in this case.
 
Thanks anyway,
 
Jack


From: Dave Pawson <davep@dpawson.co.uk>
To: Jack Bush <netbeansfan@yahoo.com.au>
Cc: xml-dev@lists.xml.org
Sent: Saturday, 1 November, 2008 1:06:13 AM
Subject: Re: [xml-dev] How to convert html file to xml format using Tagsoup

Jack Bush wrote:
> Hi All,
>  I would like to find out how to convert html file to xml format using Tagsoup in Java 6.
>  I have read through all the documents from http://tagsoup.info and other Google searches but could not find much useful material. Majority of these material are more for overview as opposed to explanation on how to use each classes provided by Tagsoup package.
>  Both tagsoup-1.2.jar and tagsoup-1.2-src.zip have been downloaded but only has the instruction to use tagsoup-1.2.jar as a stand- alone command. However, I am looking for some examples on how to do the conversion in Java instead.

XSLT any good? Use tagsoup as the parser of your html, then
treat it as wf XML?

java -cp /myjava/saxon9.jar:/myjava/xercesImpl.jar:/myjava/tagsoup.jar  net.sf.saxon.Transform  -x:org.ccil.cowan.tagsoup.Parser -t -o $3 $1 $2  $4 $5 $6


Bit easier than hacking at the dom.




HTH

-- Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk


Search 1000's of available singles in your area at the new Yahoo!7 Dating.. Get Started.

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS