[
Lists Home |
Date Index |
Thread Index
]
- From: Matt Sergeant <matt@sergeant.org>
- To: "Christopher R. Maden" <crism@lexica.net>
- Date: Thu, 21 Sep 2000 22:56:13 +0100 (BST)
On Thu, 21 Sep 2000, Christopher R. Maden wrote:
> At 15:20 21-09-2000 +0100, Dylan Walsh wrote:
> >Hi. I am working on preparing examples of HTML, created by professional web
> >designers, for use with XSL transformations. Obviously the markup needs to
> >be made well-formed for this purpose. I am familiar with Tidy from the W3C,
> >however that utility goes beyond what we need, as it makes changes to
> >conform to their standards. This unfortunately results in visual anomalies
> >in the output.
> >
> >I understand the arguements behind XHTML etc., however the markup we are
> >given is designed to work look good with older browsers, on different
> >platforms. Is there any software out there that converts HTML to be
> >well-formed XML, but does not make changes beyond that, e.g. to obey the
> >HTML or XHTML standards?
>
> Is the HTML valid SGML (i.e., complies with one of the HTML DTDs)? If so,
> you can use James Clark's sx (<URL:http://www.jclark.com/sp/>), which will
> do a straight SGML-to-XML translation with no knowledge of HTML's semantics.
>
> If the HTML is tag soup, then you may be SOL; try Perl with the
> HTML::Parser module, or something.
If HTML::Parser will work on the malformed HTML, you can use XML::PYX,
which comes with a pyxhtml script, so you can go:
pyxhtml <file> | pyxw > <newfile>
pyxhtml uses HTML::Parser (actually HTML::TreeBuilder, but ultimately
refers to HTML::Parser).
I believe the Python PYX tools have a similar facility too.
--
<Matt/>
Fastnet Software Ltd. High Performance Web Specialists
Providing mod_perl, XML, Sybase and Oracle solutions
Email for training and consultancy availability.
http://sergeant.org | AxKit: http://axkit.org
|