Re: [xml-dev] XML CDATA sections ... the good, the bad, and the ugly

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: David Carlisle <d.p.carlisle@gmail.com>
To: "Costello, Roger L." <costello@mitre.org>
Date: Fri, 26 Jun 2015 14:01:08 +0100

On 26 June 2015 at 13:52, Costello, Roger L. <costello@mitre.org> wrote:
> Hi Folks,
>
>
>
> The CDATA section is a mechanism for disabling the normal interpretation of
> XML syntax. For example, the title element in this XML document contains a
> script element:
>
>
>
> <title>
>     <script>...</script>
> </title>
>
>
>
> The title element in this XML document contains a string:
>
>
>
> <title>
>     <![CDATA[<script>...</script>]]>
> </title>
>
>
>
> Ordinarily <script> would be interpreted as a start tag but since <script>
> is embedded within a CDATA section the normal interpretation is disabled and
> <script> is treated simply as a string.
>
>
>
> There is nothing that XSLT programs can do about CDATA sections. That is
> because the XML parser removes the CDATA wrapper and creates a text node for
> its content. The CDATA wrapper is gone by the time the XSLT program gets the
> XML; all that remains is a text node.
>
>
>
> Nonetheless, some XSLT processors can be configured to instruct its XML
> parser to remove the CDATA sections. The approach for doing this varies with
> the XSLT processor. This article describes the approach used by one XSLT
> processor: http://sourceforge.net/p/saxon/mailman/message/34240016/.
>
>
>
> Any errors in this description? Anything you would add or delete?
>
>
>
> /Roger
>
>



No real errors but I'd add a strong warning that if you are needing to
do this then probably you should not be using an XML syntax.

Clearly it is possible to infer different meaning between

<x><![CDATA[zzz]]/x>

and

<x>zzz</x>

Just as it is between

<x    >zzz</x>

and

<x>zzz</x>

However XML processors are supposed to treat these as the same input
and if they start to treat them differently then that means that
producers of XML lose confidence in the format: things that are
supposed to be cosmetic stylistic choices in the output may have
unknown effects in future unknown readers of the document.

David

References:
- XML CDATA sections ... the good, the bad, and the ugly
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]