xml-dev - Re: CDATA by any other name... (was The raw and the cooked)

Re: CDATA by any other name... (was The raw and the cooked)

[ Lists Home | Date Index | Thread Index ]

From: John Cowan <cowan@locke.ccil.org>
To: XML Dev <xml-dev@ic.ac.uk>
Date: Tue, 03 Nov 1998 11:39:18 -0500

Rick Jelliffe wrote:

> A CDATA marked section is not only a way to prevent delimiter recognition.
> It is also a way to declare that the characters in that section are limited
> to ones available in the direct document encoding of the originating system.

True.  However, since the standard encodings of XML include all the
characters there are (and if they don't include yours, just you
wait, 'Enry 'Iggins), that isn't as much of an issue.

> (SGML has a CDATA keyword you can use instead of content models: XML was
> felt not to need it because you could use <![CDATA[, however that perhaps
> shows the mind of the XML WG at that time, in that they were down-playing
> the need for schemas.)

CDATA elements are eeeeeevil.  They terminate at any ETAGO followed by
a name-start character, and they make it impossible to change your
mind later, if you decide you need an entity or two.  See the excellent
articles at.  They were rightly discarded from XML.

> For example, I cannot see why a smart editor could not use the CDATA section
> to cofine editing to whatever the repertoire of the character set of the
> encoding attribute of the XML header says.

IMHO, a *smart* editor would realize that a CDATA section cannot cope,
and would terminate it around the problem character.  For example,
an attempt to insert a dagger (U+2020) into a CDATA section within
an 8859-1 document would produce this:

	... ]]>&#x2020;<![CDATA[ ...

This specific case comes up when people using code page 1252 try to exploit
the non-8859-1 characters it supplies.

> In the case of editing the XML
> specification, for example, when there is a CDATA marked section being
> edited, and the editor types "<", a smart section should know not to replace
> it with "&lt;" or expect it to be a STAGO.

XED indeed has this property, although it just feeps if you attempt
to type a character that would cause "]]>" to appear in a CDATA
section, rather than splitting the section (which admittedly would
be painful to undo after a Backspace character).

> It would be nice
> if W3C allowed this, but the less that a PI can be treated (by XLL or DOM or
> SAX or whatever) as a kind of element,

The current XPointer draft allows PIs to be referred to on equal terms with
elements (except for not having a GI or attributes or sub-elements).

The DOM has a ProcessingInstruction node, though pseudo-attribute parsing
is not performed.

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

Follow-Ups:
- RE: CDATA by any other name... (was The raw and the cooked)
  - From: "Rick Jelliffe" <ricko@allette.com.au>
- Re: CDATA by any other name... (was The raw and the cooked)
  - From: John Cowan <cowan@locke.ccil.org>
- Re: CDATA by any other name... (was The raw and the cooked)
  - From: "Lauren Wood" <lauren@sqwest.bc.ca>

References:
- RE: CDATA by any other name... (was The raw and the cooked)
  - From: "Rick Jelliffe" <ricko@allette.com.au>

Prev by Date: Re: CDATA by any other name... (was The raw and the cooked)
Next by Date: Re: CDATA by any other name... (was The raw and the cooked)
Previous by thread: XML in IE5 beta PR2
Next by thread: Re: CDATA by any other name... (was The raw and the cooked)
Index(es):
- Date
- Thread