OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using CDATA sections for element content??

First of all, you _can_ have ']]' in a CDATA section as long
as the next character isn't '>'.  See production 21 in
"Extensible Markup Language (XML) 1.0 (Second Edition)":

    [21]    CDEnd   ::=  ']]>'
valid:         <![CDATA[ [2*[6+1]]=28 ]]>
not valid:     <![CDATA[ [2*[6+1]]>27 ]]>
valid:         <![CDATA[ [2*[6+1]] > 27 ]]>

As to your comment that you're not using external entities,
but may use internal entities, it doesn't really inside a
CDATA section.  It's the entity _references_ inside your
PCDATA that you should be concerned about.  

If you simply wrap the PCDATA inside a CDATA section you are
changing the PCDATA to CDATA, whether that's what you want
to do or not.  Any entity references inside the CDATA
section  will get "flattened" and come right through your
parser to your application.  For example, using the letter
"x" instead of a real "times" symbol, if you have

    <foo>14 &times; 2 = 28</foo>

it will come out as: 14 x 2 = 28 

    <foo><![CDATA[14 &times; 2 = 28]]></foo>

will come out as: 14 &times; 2 = 28

To avoid this problem you would need to convert all of the
character entity references to some "real" characters.  If
you're using Unicode it shouldn't be too much of a problem,
but if you're using single-byte encodings you'll need to
keep track of the "font" as well since sometime the same
byte-code means different characters in different fonts.

And, as Bob Kline wrote, you can't nest CDATA sections. 
Because you cannot nest CDATA sections you can only wrap the
PCDATA portion of elements which contain child elements. 
Otherwise you'll turn the start and end tags of nested
elements into CDATA.

Overall, unless there's something really special or 
strange about how you're processing the XML stream, you 
probably shouldn't wrap all of your current PCDATA in 
CDATA sections. 

/s/ Ernest G. Allen