OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: Need a tool that converts HTML into well-formed XML,and nothing more

[ Lists Home | Date Index | Thread Index ]
  • From: tpassin@home.com
  • To: xml-dev@lists.xml.org
  • Date: Thu, 21 Sep 2000 23:09:16 -0400

Dylan Walsh asked -

> Hi. I am working on preparing examples of HTML, created by professional
web
> designers, for use with XSL transformations. Obviously the markup needs to
> be made well-formed for this purpose. I am familiar with Tidy from the
W3C,
> however that utility goes beyond what we need, as it makes changes to
> conform to their standards. This unfortunately results in visual anomalies
> in the output.
>
> I understand the arguements behind XHTML etc., however the markup we are
> given is designed to work look good with older browsers, on different
> platforms. Is there any software out there that converts HTML to be
> well-formed XML, but does not make changes beyond that, e.g. to obey the
> HTML or XHTML standards?

Here's one more response (although Chris Maden's suggestion for sx is
probably the best) - Python has a module called "sgmllib" which by default
produces ESIS output from an HTML file.  You'd have to convert the ESIS, but
all the white spaces would be there.  It also has an "htmllib" built on the
previous one.  You'd have to modify one of the formatters to output xml, but
it wouldn't be too hard.

Cheers,

Tom Passin





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS