OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Gag me with a blunt …



Tim Bray wrote:

> Has anyone seen this thing?
> 
>   http://www.w3.org/TR/newline
> 
> I have a horrid suspicion that it's actually correct.

I'm not convinced.  The XML spec says that Unicode character #x85 is not
a whitespace characters.  It appears from the Note that EBCDIC text
files on IBM mainframes represent newline by a byte with code 0x85. The
solution appears obvious to me: the EBCDIC encoding table used by the
XML parser should map byte 0x85 to Unicode character 0xA.  Appendix A in
the Note even has an example of doing this already in another context:

File Creation Method            Line Ending Generated on OS/390
...
\n printf output:               [NEL]
OS/390 C or Java program
                                                                     
\n in a Java string represents Unicode character #xA.  This table is
saying that on output a Unicode character 0xA is encoded by the byte
0x85. Why can't XML apply the same trick on input?

James