Lists Home |
Date Index |
On Wed, 2002-10-16 at 06:56, Elliotte Rusty Harold wrote:
> C0 control characters such as form feed, vertical tab, BEL, and DC1
> through DC4 (whatever those are) are now allowed in XML text. However, they
> must be escaped as character references. They cannot be included literally in
> data. Nulls, thankfully, are still forbidden.
Why this is I don't understand. If you're allowing all sorts of control
characters, forced encoded, what difference would it make to allow a
null? Either the things stay safely encoded, in which case null is no
different than the other controls, or they don't, in which case null is
no different than the other controls.
> The C1 control characters such as BPH, IND, NBH, and PU1 are no longer
> allowed as literals in XML text. They too must now be escaped as character
I like this, in some ways. If controls are going to be allowed at all,
then they should be handled *somehow*, and encoding seems to be the
choice of the moment. I at least like the idea that C1 is to be treated
with the same disdain that C0 gets.
> references. For the first time this means that some well-formed XML 1.0
> documents are not well-formed XML 1.1 documents. The exception, of course, is
> IBM's holy grail of NEL, which will be allowed in literal XML text, just to
> make life difficult for every text editor on the planet except those from IBM
Here, I get confused. I went and looked at the 1.1 spec. There's a
change to the discussion of line endings, which suggests that #xD #x85
and #x85 and #x2028 get normalized to #xA. Like #xD #xA or #xD followed
by anything else.
However, the production for S is not changed, so although these things
participate in line endings, they aren't space characters. Is that
If the answer is "it doesn't matter, line end processing happens before
checking for space," then the S production still ought to be changed
(for clarity), to remove #xD, which is as can't-appear in that situation
as any of the new bits. But it makes more sense to me that anything
considered to be part of a line ending ought to be listed in S, which
would become: #x9 #xA #xD #x20 #x85 #x2028. I don't understand the
But the whole thing seems to be nearly as weird as the Namespaces 1.1
rec, which seems to think that because the only way to have no namespace
is to allow undeclaration of the default namespace, then named prefixes
also ought to be undeclared. Pure hobgoblin: foolish consistency.
Amelia A. Lewis firstname.lastname@example.org email@example.com
The law, in its majestic equality, forbids the rich as well as the poor
to sleep under bridges, to beg in the streets, and to steal bread.
-- Anatole France, "Le Lys Rouge"