[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Gag me with a blunt …
- From: Tim Bray <firstname.lastname@example.org>
- To: email@example.com
- Date: Sun, 18 Mar 2001 17:49:37 -0800
At 06:00 PM 16/03/01 -0500, John Cowan wrote:
>XML 1.0 chose not to implement non-ASCII whitespace characters from Unicode 2.0.
To be honest, I don't think we ever articulated the principle,
or proceeded from it.
However, the discussion over Ideographic Space, U+3000, was long
and agonized, thus the decision to omit it from the production for
"S" was not lightly taken at all. It's hard to see how you could
let in U+0085 without letting in ideospace and a bunch more
characters that have in some respect the characteristics of
white-space-ness. Someone sent me a note offline giving a long
list of such items, and it's pretty clear that letting in U+0085
could start us down a slippery slope.
Note (although no processor other than Lark ever did this as
far as I know) that if you want to build a DFA-based XML
processor, you can use the trick of recognizing all the syntax
characters with a 7-bit state table and a remarkably small
amount of clever sidestepping is required to deal with all
the non-ASCII characters. -Tim