Re: [xml-dev] Tradeoffs of XML encoding by enclosing all contentin CDATA

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] Tradeoffs of XML encoding by enclosing all contentin CDATA blocks

From: Manos Batsis <manos_lists@geekologue.com>
To: "Karr, David" <david.karr@wamu.net>, XML Developers List <xml-dev@lists.xml.org>
Date: Mon, 29 Sep 2008 18:25:17 +0300

Karr, David wrote:
> I pointed out to a client that they're seeing failures parsing XML 
> because some of the element content that they're producing contains 
> characters illegal in XML content, like "&" (unencoded).  They 
> acknowledged that should be fixed, but they also said they could instead 
> enclose all content with CDATA blocks.  That seems bizarre to me, but 
> I'm not sure I can immediately come up with all the cogent arguments 
> against that.  Can someone summarize specifically why you should NOT do 
> that?

One thing that comes in mind is that switching to CDATA may break some 
existing code.

For example, although XPath exposes only one text node as the content of 
the following element:

<foo> text 1 <![CDATA[ text 2]]> text 3</foo>

AFAIK DOM will expose the children of <foo> as at least three nodes 
(text node text 1 , CDATA section text 2, text node text 3). Code using 
text normalization will also not work (not sure about DOM2+).

Should be easier and safer to just add some code to escape three 
characters to their entity equivalents.

hth,

Manos

References:
- Tradeoffs of XML encoding by enclosing all content in CDATA blocks
  - From: "Karr, David" <david.karr@wamu.net>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]