[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: CDATA sections in W3C XML Infoset
- From: John Cowan <jcowan@reutershealth.com>
- To: Bob Kline <bkline@rksystems.com>, "xml-dev@xml.org" <xml-dev@xml.org>
- Date: Fri, 30 Mar 2001 11:37:10 -0500
Bob Kline wrote:
> No? We have quite a bit of code in our XML repository which uses XML
> commands over sockets for its client-server interface to the rest of the
> world. Most of the commands embed an XML document being stored in or
> retrieved from the repository. The embedded documents are wrapped in
> CDATA sections.
And when the embedded document already contains a CDATA section? Bzzzzt,
not well-formed.
> The logic for extracting a document from an incoming
> client command is essentially:
>
> Find the element containing the CDATA section.
> Find the CDATA child of the element.
> Hand the value of the CDATA section to the parser.
I admit this is an easy DOM-based hack. But it shouldn't be
*that* much harder to know what element you are looking for,
pull out a Text child (initially there should be only one,
or you can normalize), and do the conversion below.
> Before you even think about suggesting how easy it would be to restore
> the angle brackets in the embedded document, let me point out that the
> < and > which are not delimiters for the element tags in the
> embedded document cannot be "restored" to < and >, and I submit that it
> is impossible in some cases to distinguish which those were. Therefore
> information has been lost.
Not so if you encode properly. By changing every "&" in the embedded
document to "&" and every "<" to "<" (conceptually in that order),
you get this result:
Original Embedding
< <
& &
< &lt;
& &amp;
&lt; &amp;lt
Etc. etc. No information is lost: change every "<" to "<" and
every "&" to "&" (conceptually in that order) and the exact
original is restored. In this encoding, ">" characters need not
be changed.
> Before you suggest that the embedded document should not have been
> wrapped in a CDATA section in the first place, let me say that:
[points snipped]
These points basically say that your embedded documents are text,
not necessarily XML. The safe way to encode text in an XML document
is to use the mapping above.
--
There is / one art || John Cowan <jcowan@reutershealth.com>
no more / no less || http://www.reutershealth.com
to do / all things || http://www.ccil.org/~cowan
with art- / lessness \\ -- Piet Hein