[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] What to escape when serializing XML
- From: richard@inf.ed.ac.uk (Richard Tobin)
- To: xml-dev@lists.xml.org
- Date: Wed, 3 Jan 2007 12:22:44 +0000 (GMT)
In article <200701021413.20468.frans.englich@telia.com> you write:
>These paragraphs gives good hints to the complexity in this, but it's
>not very exact("Specifically, CR, NEL ...").
I'm not sure what you find inexact about it. It lists the three
characters that must be escaped in text to avoid their being
normalised when re-read, and the five that must be escaped in
attributes for the same reason.
If you're serialising as XML 1.0 you don't need to bother escaping NEL
and LSEP (because they don't get normalised when read). But as the
text you quoted notes, a 1.0 external entity included in a 1.1
document is parsed as XML 1.1, so if your output might be used as an
external entity in that way - rather than as a complete XML document -
you will need to escape them. You might as well escape them anyway.
I'll try to summarise:
1-1F except CR, TAB, NL:
Can't occur in XML 1.0. Can occur in XML 1.1 and must be escaped.
CR:
Always escape.
NL, TAB:
Escape in attribute values.
NEL, LSEP:
Always escape (only essential if serialising as XML 1.1).
7F-9F except NEL:
Always escape (only essential if serialising as XML 1.1).
less-than, ampersand:
Always escape.
greater-than:
Escape in text if it immediately follows two close-square-brackets, as
that sequence is only allowed as the end of a CDATA marked section.
single-quote, double-quote:
Escape in attribute values quoted with the same kind of quote.
I think it's safe to always escape all of these, but always escaping
NL would make things unreadable.
-- Richard
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]