[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Gag me with a blunt …
- From: James Clark <email@example.com>
- To: firstname.lastname@example.org
- Date: Fri, 16 Mar 2001 13:46:26 +0700
Tim Bray wrote:
> Has anyone seen this thing?
> I have a horrid suspicion that it's actually correct.
I'm not convinced. The XML spec says that Unicode character #x85 is not
a whitespace characters. It appears from the Note that EBCDIC text
files on IBM mainframes represent newline by a byte with code 0x85. The
solution appears obvious to me: the EBCDIC encoding table used by the
XML parser should map byte 0x85 to Unicode character 0xA. Appendix A in
the Note even has an example of doing this already in another context:
File Creation Method Line Ending Generated on OS/390
\n printf output: [NEL]
OS/390 C or Java program
\n in a Java string represents Unicode character #xA. This table is
saying that on output a Unicode character 0xA is encoded by the byte
0x85. Why can't XML apply the same trick on input?