RE: [xml-dev] CDATA headache

I sympathize with your customer and you as well.

You will note that XDM http://www.w3.org/TR/2007/REC-xpath-datamodel-20070123/#Node on which XSLT2, XQuery and XPath are based has no CData node.

XDM cannot preserve the DOCTYPE directive either.

XDM has attitude - it represents a more simplified processing model for XML than InfoSet.� I like simple but is XDM more simple than it should be.

Roundtripping of serialized forms is a current issue for XML and XDM - the tools like:

SAX

XSLT

XPATH/XQUERY

XML databases

Have to all agree on the data model - right now, in a pipeline, it is lowest common denominator that carries the day - for example SAX can do DocType declarations but XSLT cannot pass the through, so they get dropped.� SAX LexicalHandler supports CDATA begin and end.

Jim

From: mike@xml-solutions.com [mailto:mike@xml-solutions.com] On Behalf Of michael odling-smee
Sent: Wednesday, September 09, 2009 4:34 AM
To: Bjoern Hoehrmann
Cc: XML Developers List
Subject: Re: [xml-dev] CDATA headache

Bjoern,

Thanks for the link to the XPath specification - had not thought to look there! Section 5.7 (http://www.w3.org/TR/xpath#section-Text-Nodes) should hopefully be enough to convince the customer.

Kind regards,

Michael

On Wed, Sep 9, 2009 at 12:12 PM, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:

* michael odling-smee wrote:
>Now I am well aware that these are entirely equivalent from an XML
>standpoint - however the customer point of view is if they are equivalent
>why has the parser altered the way the characters are escaped? On this front
>it is unlikely that links to sites such as
>http://www.dpawson.co.uk/xsl/sect2/cdata.html#d3164e447 will be enough to
>convince them that our processor is behaving correctly - they need a formal
>specification. I have searched the XML specification (to no avail) and was
>wondering whether anyone could point me to the relevant place which
>specifies that this is expected behaviour of the parser/xslt processor.

As far as XSLT 1.0 output goes this is covered by the XPath 1.0 data
model defined http://www.w3.org/TR/xpath#data-model in which there are
no CDATA sections; without proprietary extensions the difference be-
tween the two forms cannot be represented and thus not retained. You
can use the cdata-section-elements attribute of the xsl:output element
to have the serializer add such sections to elements with certain names.

This is the result of simplification, more work is required to retain
the insignificant differences, just like you do not usually treat the
difference between example='value' and example="value" or the difference
between example='ö', example='ö', and example='�' as something
worth retaining. To retain the differences you could, for example, pre-
process the documents, replacing CDATA sections with e.g. elements, and
then transform the elements back into CDATA sections later on.
--
Bj�rn H�hrmann � mailto:bjoern@hoehrmann.de � http://bjoern.hoehrmann.de
Am Badedeich 7 � Telefon: +49(0)160/4415681 � http://www.bjoernsworld.de
25899 Dageb�ll � PGP Pub. KeyID: 0xA4357E78 � http://www.websitedev.de/