OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: Need a tool that converts HTML into well-formed XML,and nothing more

[ Lists Home | Date Index | Thread Index ]
  • From: Frank Boumphrey <bckman@ix.netcom.com>
  • To: Matt Sergeant <matt@sergeant.org>, "Christopher R. Maden" <crism@lexica.net>
  • Date: Thu, 21 Sep 2000 18:49:06 -0400

> >conform to their standards. This unfortunately results in visual
anomalies
> >in the output.

I would be interested in what problems you encounter. I have used Tidy quite
exclusivly, and have never had a problem. As far as I can see apart from
adding a namespace declaration and a public identifier, it does convert the
original HTML to 'pure' XML. And you can always write a simple script to
strip the namespace declaration and the identifier.

Frank
----- Original Message -----
From: "Matt Sergeant" <matt@sergeant.org>
To: "Christopher R. Maden" <crism@lexica.net>
Cc: <xml-dev@lists.xml.org>
Sent: Thursday, September 21, 2000 5:56 PM
Subject: Re: Need a tool that converts HTML into well-formed XML, and
nothing more.


> On Thu, 21 Sep 2000, Christopher R. Maden wrote:
>
> > At 15:20 21-09-2000 +0100, Dylan Walsh wrote:
> > >Hi. I am working on preparing examples of HTML, created by professional
web
> > >designers, for use with XSL transformations. Obviously the markup needs
to
> > >be made well-formed for this purpose. I am familiar with Tidy from the
W3C,
> > >however that utility goes beyond what we need, as it makes changes to
> > >conform to their standards. This unfortunately results in visual
anomalies
> > >in the output.
> > >
> > >I understand the arguements behind XHTML etc., however the markup we
are
> > >given is designed to work look good with older browsers, on different
> > >platforms. Is there any software out there that converts HTML to be
> > >well-formed XML, but does not make changes beyond that, e.g. to obey
the
> > >HTML or XHTML standards?
> >
> > Is the HTML valid SGML (i.e., complies with one of the HTML DTDs)?  If
so,
> > you can use James Clark's sx (<URL:http://www.jclark.com/sp/>), which
will
> > do a straight SGML-to-XML translation with no knowledge of HTML's
semantics.
> >
> > If the HTML is tag soup, then you may be SOL; try Perl with the
> > HTML::Parser module, or something.
>
> If HTML::Parser will work on the malformed HTML, you can use XML::PYX,
> which comes with a pyxhtml script, so you can go:
>
> pyxhtml <file> | pyxw > <newfile>
>
> pyxhtml uses HTML::Parser (actually HTML::TreeBuilder, but ultimately
> refers to HTML::Parser).
>
> I believe the Python PYX tools have a similar facility too.
>
> --
> <Matt/>
>
> Fastnet Software Ltd. High Performance Web Specialists
> Providing mod_perl, XML, Sybase and Oracle solutions
> Email for training and consultancy availability.
> http://sergeant.org | AxKit: http://axkit.org
>





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS