[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CDATA sections in W3C XML Infoset

From: Bob Kline <bkline@rksystems.com>
To: XML-Dev Mailing List <xml-dev@lists.xml.org>
Date: Fri, 30 Mar 2001 10:54:59 -0500 (EST)

John Cowan writes:

> Richard Lanyon scripsit:
> 
> > Could someone explain to me why CDATA section start/end markers were
> > taken out of the W3C Infoset?
> 
> Essentially because we (but I am NOT speaking officially for the
> Core WG here) don't think they carry any real information.  There is
> no difference between CDATA sections and careful use of &lt; and
> &amp;.

No?  We have quite a bit of code in our XML repository which uses XML
commands over sockets for its client-server interface to the rest of the
world.  Most of the commands embed an XML document being stored in or
retrieved from the repository.  The embedded documents are wrapped in
CDATA sections.  The logic for extracting a document from an incoming
client command is essentially:

   Find the element containing the CDATA section.
   Find the CDATA child of the element.
   Hand the value of the CDATA section to the parser.

That doesn't work if some process in the pipeline replaces the CDATA
section with an escaped text node.  You may be silently adding some
mental qualifications to your phrase "no difference" but from where I
stand if it breaks software that we've written to a W3C interface (the
DOM) there's a difference, any amount of sophistry notwithstanding.  I
see that there are tools out there already (e.g., Lars Marius Garshol's
roundtrip.py [1]) which make the same assumption you are making.  This
is lovely news.

Before you even think about suggesting how easy it would be to restore
the angle brackets in the embedded document, let me point out that the
&lt; and &gt; which are not delimiters for the element tags in the
embedded document cannot be "restored" to < and >, and I submit that it
is impossible in some cases to distinguish which those were.  Therefore
information has been lost.

Before you suggest that the embedded document should not have been
wrapped in a CDATA section in the first place, let me say that:

  1. not doing so would make it impossible to validate the commands
     and their responses against their DTDs; and
  2. we have a requirement to be able to store a document in progress
     even in the case in which it is not well-formed (some of the
     documents in the repository will be imported from outside sources
     and we must capture the original documents whatever their state).

[1] http://cvs.sourceforge.net/cgi-bin/cvsweb.cgi/python/dist/src/Demo/xml/roundtrip.py?cvsroot=python

-- 
Bob Kline
mailto:bkline@rksystems.com
http://www.rksystems.com

Follow-Ups:
- Re: CDATA sections in W3C XML Infoset
  - From: John Cowan <jcowan@reutershealth.com>
- Re: CDATA sections in W3C XML Infoset
  - From: John Aldridge <john.aldridge@informatix.co.uk>
- Re: CDATA sections in W3C XML Infoset
  - From: Tim Bray <tbray@textuality.com>

Prev by Date: Re: SQL Generation
Next by Date: RE: experts
Previous by thread: RE: CDATA sections in W3C XML Infoset
Next by thread: Re: CDATA sections in W3C XML Infoset
Index(es):
- Date
- Thread