[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Tradeoffs of XML encoding by enclosing all content in CDATA blocks
- From: "Andrew Welch" <andrew.j.welch@gmail.com>
- To: "Karr, David" <david.karr@wamu.net>
- Date: Mon, 29 Sep 2008 16:40:37 +0100
2008/9/29 Karr, David <david.karr@wamu.net>:
> I pointed out to a client that they're seeing failures parsing XML because
> some of the element content that they're producing contains characters
> illegal in XML content, like "&" (unencoded). They acknowledged that should
> be fixed, but they also said they could instead enclose all content with
> CDATA blocks. That seems bizarre to me, but I'm not sure I can immediately
> come up with all the cogent arguments against that. Can someone summarize
> specifically why you should NOT do that?
You often get this problem when people write XML as a string rather
than using a proper XML Writer...
For example:
xmlStr = "<foo>" + someVal + "</foo>";
write(xmlStr);
The are several problems with this approach, one being that ampersands
won't be escaped properly.
The answer they usually go for is to replace all occurrences of & with
&_amp; but then you see double escaping &_amp;amp; of character and
entity references.
Then you get the string &_amp;amp; in the result, which appears as
"&_amp;" in the browser, so they attached a post processing step to
convert "&_amp;amp;" to &_amp;
....and so on and so on. (you also see these pre- and post-processing
steps to get around encoding issues)
The root cause of all of this, is that someone wrote XML as string
rather than using an XML Writer. So I would suggest finding out how
they create the XML, and go from there.
--
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]