Hi All,
I would like to extract only the content of the following mixed XML document converted from XHTML format:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html (View Source for full doctype...)>
- <html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
- <head>
- <body
.......
- ><div id="container">
+ <div id="header">
<div id="postmark" />
- <div id="content">
+ <div id="as1">
.......
<h1>Employee Detail</h1>
- <p>
<strong>Firstname:</strong>
<a >John</a>
</p>
- <p>
<strong>Surname:</strong>
<a >Smith</a>
</p>
- <p>
<strong>Date of birth:</strong>
12/01/1932
</p>
- <p>
<strong>Sex:</strong>
M
</p>
- <p>
<strong>Interest:</strong>
<br />
Tennis
<br />
Movie
<br />
.....
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ns="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="ns">
<xsl:template match="/">
<area>
<xsl:if test="/ns:html/ns:body/ns:div[@id='content']/ns:p/ns:strong">
<xsl:choose>
<xsl:when test="contains(.,'Firstname:')">
<firstname><xsl:value-of select="."/></firstname>
</xsl:when>
<xsl:otherwise>
<firstname>Unknown</firstname>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
</area>
</xsl:template>
</xsl:stylesheet>
Not only did XSLT returned all the correct employee data but also included a lot of irrelvant data from the parental nodes as well. In other word, it return the contents from ns:html, ns:body, ns:div[@id='content'], ns:p, ns:strong which I don't want. Is there a way to prevent (exclude/mask) all these ancestral data from being included and only pick up the data in /ns:html/ns:body/ns:div[@id='content']/ns:p/ns:strong cleanly?
Many thanks,
Jack