I sympathize with your customer and you as
well.
You will note that XDM http://www.w3.org/TR/2007/REC-xpath-datamodel-20070123/#Node
on which XSLT2, XQuery and XPath are based has no CData node.
XDM cannot preserve the DOCTYPE directive
either.
XDM has attitude - it represents a more
simplified processing model for XML
than InfoSet. I like simple but is XDM more simple than it should be.
Roundtripping of serialized forms is a
current issue for XML and XDM - the tools like:
SAX
XSLT
XPATH/XQUERY
XML databases
Have to all agree on the data model - right
now, in a pipeline, it is lowest common denominator that carries the day - for
example SAX can do DocType declarations but XSLT cannot pass the through, so
they get dropped. SAX LexicalHandler supports CDATA begin and end.
Jim
From:
mike@xml-solutions.com [mailto:mike@xml-solutions.com] On Behalf Of michael odling-smee
Sent: Wednesday, September 09,
2009 4:34 AM
To: Bjoern Hoehrmann
Cc: XML Developers List
Subject: Re: [xml-dev] CDATA
headache
Bjoern,
Thanks for the link to the XPath specification - had not thought to look there!
Section 5.7 (http://www.w3.org/TR/xpath#section-Text-Nodes)
should hopefully be enough to convince the customer.
Kind regards,
Michael
On Wed, Sep 9, 2009 at 12:12 PM, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
* michael odling-smee wrote:
>Now I am well aware that these are entirely equivalent from an XML
>standpoint - however the customer point of view is if they are equivalent
>why has the parser altered the way the characters are escaped? On this
front
>it is unlikely that links to sites such as
>http://www.dpawson.co.uk/xsl/sect2/cdata.html#d3164e447
will be enough to
>convince them that our processor
is behaving correctly - they need a formal
>specification. I have searched the XML specification (to no avail) and was
>wondering whether anyone could point me to the relevant place which
>specifies that this is expected behaviour of the parser/xslt processor.
As far as XSLT 1.0 output goes this is covered by the
XPath 1.0 data
model defined http://www.w3.org/TR/xpath#data-model
in which there are
no CDATA sections; without proprietary extensions the difference be-
tween the two forms cannot be represented and thus not retained. You
can use the cdata-section-elements attribute of the xsl:output element
to have the serializer add such sections to elements with certain names.
This is the result of simplification, more work is required to retain
the insignificant differences, just like you do not usually treat the
difference between example='value' and example="value" or the
difference
between example='ö', example='ö', and example='ö' as
something
worth retaining. To retain the differences you could, for example, pre-
process the documents, replacing
CDATA sections with e.g. elements, and
then transform the elements back into CDATA sections later on.
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de
· http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/