OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: HTML to XML

[ Lists Home | Date Index | Thread Index ]
  • From: "Ingo Macherius" <macherius@darmstadt.gmd.de>
  • To: jmtang@us.ibm.com, xml-dev@ic.ac.uk
  • Date: Thu, 16 Jul 1998 20:02:00 +0200

jmtang@us.ibm.com <jmtang@us.ibm.com> wrote at 15 Jul 98, 14:57:

> What I am working on is to
> retrieve some information, e.g. news articles, from the web and feed the
> content to an application client which only knows application-specific XML.

So are we.

> Therefore, I need to read the HTML file, analyze it, strip away all
> presentation tags/contents, and convert the real content to the
> application-specific XML.

So you don't want HTML->XML but "arbitrary, somewhat regular string" 
to XML. Buggy HTML is just that, a somewhat regular string. You want 
to extract information and map it to a Schema, e.g. an XML-DTD. Right 
?

> Instead of writing my own HTML parser, I hope I
> can find some tool to help me or hopefully one of the XML parser can do the
> job.  I prefer Java, but I'll consider anything available.

Please have a look at
	http://www.darmstadt.gmd.de/oasys/projects/jedi/jedie.html
	http://www.darmstadt.gmd.de/oasys/projects/jedi/jp.pdf

The online demos already use Jedi 2.0, which is currently being 
prepared for (*.class only) distribution. Expect 2.0 for download in 
about 7 days. The current downloadable version is 0.1, which is about 
2 years old.

	++im
--
Ingo Macherius//Dolivostrasse 15//D-64293 Darmstadt//+49-6151-869-882
GMD-IPSI German National Research Center for Information Technology
mailto:macherius@gmd.de http://www.darmstadt.gmd.de/~inim/
Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Zappa)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS