XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Parsing bad HTML



If you want a pure XSLT solution, this seems to work..

bad.xsl

<xsl:stylesheet  version="2.0" 
		 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
		 xmlns:d="data:,dpc"
		 exclude-result-prefixes="d">
 

<xsl:import href="http://www.dcarlisle.demon.co.uk/htmlparse.xsl"/>
<xsl:template match="/">
<xsl:sequence select="d:htmlparse(.)"/>
</xsl:template>

</xsl:stylesheet>


bad.xml


<bad><![CDATA[


 <p>
 <sub>123</sub>4567<eight<img src="file.gif" alt="<b>hello</b>">
 </p>


]]></bad>


$ saxon9 bad.xml bad.xsl
<?xml version="1.0" encoding="UTF-8"?>


 <p xmlns="http://www.w3.org/1999/xhtml";>
 <sub>123</sub>4567&lt;eight<img src="file.gif" alt="&lt;b&gt;hello&lt;/b&gt;"/>
 </p>




David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS