[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CDATA sections in W3C XML Infoset

From: Bob Kline <bkline@rksystems.com>
To: Tim Bray <tbray@textuality.com>
Date: Fri, 30 Mar 2001 19:43:40 -0500 (EST)

On Fri, 30 Mar 2001, Tim Bray wrote:

> At 10:54 AM 30/03/01 -0500, Bob Kline wrote:
> >
> >   Find the element containing the CDATA section.
> >   Find the CDATA child of the element.
> >   Hand the value of the CDATA section to the parser.
> 
> This seems really questionable.  Using CDATA sections to embed
> other XML that's known not to contain any seems just fine.
> On the other hand, relying on that CDATA section to tell you
> where it is smells bad.  Wouldn't it have been immensely
> better to do
> 
> <mydoc>...
>   <included-doc><![CDATA[ <something-else /> .. 
>   ]]></included-doc> ...
> </mydoc>
> 
> Then the CDATA does what it's supposed to, simplify
> escaping, and the tags do what they're supposed to,
> provide semantic markup saying what pieces of text
> really are.  -Tim

You're absolutely right, and that's exactly what we're doing.  Our
command sets look like this:

<CdrCommandSet>
 <SessionId>...</SessionId>
 <CdrCommand>
  <CdrAddDoc>
   <CdrDoc Type='Protocol'>
    <CdrDocCtl>... [some control elements] ...</CdrDocCtl>
    <CdrDocXml><![CDATA[
     <Protocol> ... [rest of the embedded document] ...
     </Protocol>
    ]]></CdrDocXml>
   </CdrDoc>
  </CdrAddDoc>
 </CdrCommand>
 ... [more commands] ...
</CdrCommandSet>

So the step described as "Find the element containing the CDATA section"
was referring to the location of a specific element (the CdrDocXml
element at a known position in the hierarchy of a CdrCommand element),
not a blind stumbling around looking for any CDATA section we happened
to run into.  I can see why you might have come to a different
conclusion, though, given the less than precise wording.  If I hadn't
been trying to keep the description of each step down to a single line I
might have said "Find the specific element which we know will contain a
single child node for the CDATA section for the embedded document."  
Sorry for any confusion my sloppy wording may have caused.

The approach we're using give us a number of benefits.  Some of these
benefits are:

  * We can validate the command set against a DTD.
  * A human can easily examine a document embedded in a command set
    when troubleshooting is required (much harder to do with lots
    of escaping going on).
  * We avoid the additional overhead and complexity of extra
    re-encoding and decoding steps.

It's clear from this thread that not everyone gives these advantages the
same weight that we do.  I guess we'll just have to warn writers of any
client software for this system to steer clear of any packages which in
the name of "Infoset compliance" (to use the phrase of one of the
contributors to the thread) violate the assumptions we began with about
the ability to get back CDATA sections where we expect them.  And cross
our fingers that the DOM doesn't get changed to match what's happened to
the Infoset.

-- 
Bob Kline
mailto:bkline@rksystems.com
http://www.rksystems.com

Follow-Ups:
- Re: CDATA sections in W3C XML Infoset
  - From: Charles Reitzel <creitzel@mediaone.net>

References:
- Re: CDATA sections in W3C XML Infoset
  - From: Tim Bray <tbray@textuality.com>

Prev by Date: RE: validating schema instance with xerces?
Next by Date: Re: People like me (was "experts")
Previous by thread: Re: CDATA sections in W3C XML Infoset
Next by thread: Re: CDATA sections in W3C XML Infoset
Index(es):
- Date
- Thread