[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Parsing bad HTML
- From: David Carlisle <davidc@nag.co.uk>
- To: pjmaip@yahoo.com
- Date: Thu, 13 Nov 2008 21:53:21 GMT
If you want a pure XSLT solution, this seems to work..
bad.xsl
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:d="data:,dpc"
exclude-result-prefixes="d">
<xsl:import href="http://www.dcarlisle.demon.co.uk/htmlparse.xsl"/>
<xsl:template match="/">
<xsl:sequence select="d:htmlparse(.)"/>
</xsl:template>
</xsl:stylesheet>
bad.xml
<bad><![CDATA[
<p>
<sub>123</sub>4567<eight<img src="file.gif" alt="<b>hello</b>">
</p>
]]></bad>
$ saxon9 bad.xml bad.xsl
<?xml version="1.0" encoding="UTF-8"?>
<p xmlns="http://www.w3.org/1999/xhtml">
<sub>123</sub>4567<eight<img src="file.gif" alt="<b>hello</b>"/>
</p>
David
________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.
________________________________________________________________________
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]