[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Gag me with a blunt …
- From: Jesús Quiroga <jquiroga@pobox.com>
- To: Tim Bray <tbray@textuality.com>
- Date: Fri, 16 Mar 2001 23:13:38 +0100
At 14:10 16/03/01, you wrote:
>Hmmm, I wonder if current perl includes U+0085 in what
>matches \s? Etc.....
All Unicode separator characters are expected to be matched by \s eventually,
while searching in UTF-8 strings (Camel III, p. 168).
Perl 5.6.0 doesn't include U+0085 in \s yet.
>Also, unlike (almost?) all the other XML errata, changing this
>would actively break pretty well every deployed piece of XML
>software in the world. -Tim
This is not an error in the XML 1.0 spec, IMHO. Apparently, U+0085 was
assigned in Unicode 3.0, and XML 1.0 is based on Unicode 2.0.
XML 1.0 could not possibly comply in 1998 with a standard published in 2000.
The difficult question is if any change in Unicode should trigger an
instantaneous XML revision, or not. IBM thinks it should.
Unfortunately, if U+0085 is included as whitespace in the XML spec, it won't be
XML 1.0 anymore.