OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Need a tool that converts HTML into well-formed XML,and nothing more

[ Lists Home | Date Index | Thread Index ]
  • From: tpassin@home.com
  • To: xml-dev@lists.xml.org
  • Date: Thu, 21 Sep 2000 23:09:16 -0400

Dylan Walsh asked -

> Hi. I am working on preparing examples of HTML, created by professional
> designers, for use with XSL transformations. Obviously the markup needs to
> be made well-formed for this purpose. I am familiar with Tidy from the
> however that utility goes beyond what we need, as it makes changes to
> conform to their standards. This unfortunately results in visual anomalies
> in the output.
> I understand the arguements behind XHTML etc., however the markup we are
> given is designed to work look good with older browsers, on different
> platforms. Is there any software out there that converts HTML to be
> well-formed XML, but does not make changes beyond that, e.g. to obey the
> HTML or XHTML standards?

Here's one more response (although Chris Maden's suggestion for sx is
probably the best) - Python has a module called "sgmllib" which by default
produces ESIS output from an HTML file.  You'd have to convert the ESIS, but
all the white spaces would be there.  It also has an "htmllib" built on the
previous one.  You'd have to modify one of the formatters to output xml, but
it wouldn't be too hard.


Tom Passin


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS