Lists Home |
Date Index |
- From: "Ingo Macherius" <email@example.com>
- To: firstname.lastname@example.org, email@example.com
- Date: Thu, 16 Jul 1998 20:02:00 +0200
firstname.lastname@example.org <email@example.com> wrote at 15 Jul 98, 14:57:
> What I am working on is to
> retrieve some information, e.g. news articles, from the web and feed the
> content to an application client which only knows application-specific XML.
So are we.
> Therefore, I need to read the HTML file, analyze it, strip away all
> presentation tags/contents, and convert the real content to the
> application-specific XML.
So you don't want HTML->XML but "arbitrary, somewhat regular string"
to XML. Buggy HTML is just that, a somewhat regular string. You want
to extract information and map it to a Schema, e.g. an XML-DTD. Right
> Instead of writing my own HTML parser, I hope I
> can find some tool to help me or hopefully one of the XML parser can do the
> job. I prefer Java, but I'll consider anything available.
Please have a look at
The online demos already use Jedi 2.0, which is currently being
prepared for (*.class only) distribution. Expect 2.0 for download in
about 7 days. The current downloadable version is 0.1, which is about
2 years old.
Ingo Macherius//Dolivostrasse 15//D-64293 Darmstadt//+49-6151-869-882
GMD-IPSI German National Research Center for Information Technology
xml-dev: A list for W3C XML Developers. To post, mailto:firstname.lastname@example.org
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:email@example.com the following message;
To subscribe to the digests, mailto:firstname.lastname@example.org the following message;
List coordinator, Henry Rzepa (mailto:email@example.com)