OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: HTML --> XHTML --> XML Conversion (HTML Scraping)



What about HTML Tidy?

http://www.w3.org/People/Raggett/tidy/

tidy -asxml file.html > file.xml

-Mike

-----Original Message-----
From: Gul Imran [mailto:gimran@nortelnetworks.com]
Sent: Friday, March 23, 2001 8:12 AM
To: 'xml-dev@lists.xml.org'
Subject: HTML --> XHTML --> XML Conversion (HTML Scraping)


I'm wondering if anyone has any info on HTML scraping and how to extract
from HTML to get XML data.  My idea is to convert HTML to XHTML first and
then go from there.  I have found couple of software vendors that claim to
provide HTML -> XML + XSL conversion or direct HTML to WAP/VoiceXML
conversions.  One is www.percussion.com and other is www.spyglassmobile.com.
Are there any other tools out there that can perform HTML scraping for XML
data generation?  Or what logic is best for data extraction after I have
XHTML Dom Model?


______________________________________
Gul Imran
Sr. Software Engineer, R & D
e-Business Customer Interaction Solutions
Nortel Networks
mailto:gimran@nortelnetworks.com
"What the tech is going on?"