[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Gag me with a blunt …
- From: Tim Bray <tbray@textuality.com>
- To: James Clark <jjc@jclark.com>, xml-dev@lists.xml.org
- Date: Fri, 16 Mar 2001 05:10:57 -0800
At 01:46 PM 16/03/01 +0700, James Clark wrote:
>> Has anyone seen this thing?
>> http://www.w3.org/TR/newline
>> I have a horrid suspicion that it's actually correct.
>
>I'm not convinced. The XML spec says that Unicode character #x85 is not
>a whitespace characters. It appears from the Note that EBCDIC text
>files on IBM mainframes represent newline by a byte with code 0x85. The
>solution appears obvious to me: the EBCDIC encoding table used by the
>XML parser should map byte 0x85 to Unicode character 0xA.
This feels much better. And upon reflection, the thought of
XML files which have been through a mainframe starting to
percolate around the system with U+0085 embedded inside
start tags makes me nervous; I can see a lot of people
sitting in front of windows and unix boxes looking baffled
because their existing program broke in response to a
human-invisible stimulus.
Hmmm, I wonder if current perl includes U+0085 in what
matches \s? Etc.....
Also, unlike (almost?) all the other XML errata, changing this
would actively break pretty well every deployed piece of XML
software in the world. -Tim