OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] CDATA vs. Escaped characters

[ Lists Home | Date Index | Thread Index ]

----- Original Message -----
From: "Mike Brown" <mike@skew.org>
To: <xml-dev@lists.xml.org>
Cc: <NBEYER@cerner.com>
Sent: Wednesday, March 20, 2002 7:58 PM
Subject: Re: [xml-dev] CDATA vs. Escaped characters

> 4. Simple byte counts might be an issue. Using a CDATA section can cut
> the space required to store or transmit those portions of a document that
> would otherwise be riddled with numerous escaped characters. On the other
> hand, many small, unnecessary CDATA sections can add unnecessary bulk to
> size of the document.
>    - Mike

There is also a good ol' technology that would work as an alternative here
if the content would be riddled with a large percentage of escaped
characters: BASE64 encoding.  Yes, the content would inflate by 33% (4 bytes
to store 3 characters).  No, your parser wouldn't decode for you
automatically (unless you wrote the parser).  However, this can be
considerably less (byte counts) than inflation from escaped characters.
Also, processing might be a bit easier here, since only one DOM node or SAX
event would occur for the encoded content, and the block can be quickly
decoded.  Of course, it becomes necessary to denote in the XML whether the
content is or is not encoded, or to have it understood that the content is
always to be encoded.  As with any handling of character data in XML, there
are trade-offs...

My suggestion is to add <![BASE64[ ]]> to XML. :)

Seairth Jacobs


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS