[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Gag me with a blunt 

From: James Clark <jjc@jclark.com>
To: xml-dev@lists.xml.org
Date: Fri, 16 Mar 2001 13:46:26 +0700

Tim Bray wrote:

> Has anyone seen this thing?
> 
>   http://www.w3.org/TR/newline
> 
> I have a horrid suspicion that it's actually correct.

I'm not convinced.  The XML spec says that Unicode character #x85 is not
a whitespace characters.  It appears from the Note that EBCDIC text
files on IBM mainframes represent newline by a byte with code 0x85. The
solution appears obvious to me: the EBCDIC encoding table used by the
XML parser should map byte 0x85 to Unicode character 0xA.  Appendix A in
the Note even has an example of doing this already in another context:

File Creation Method            Line Ending Generated on OS/390
...
\n printf output:               [NEL]
OS/390 C or Java program

\n in a Java string represents Unicode character #xA.  This table is
saying that on output a Unicode character 0xA is encoded by the byte
0x85. Why can't XML apply the same trick on input?

James

Follow-Ups:
- Re: Gag me with a blunt 
  - From: Tim Bray <tbray@textuality.com>
- Re: Gag me with a blunt 
  - From: Richard Tobin <richard@cogsci.ed.ac.uk>

References:
- Gag me with a blunt 
  - From: Tim Bray <tbray@textuality.com>

Prev by Date: Re: Gag me with a blunt 
Next by Date: Re: Gag me with a blunt 
Previous by thread: Re: Gag me with a blunt 
Next by thread: Re: Gag me with a blunt 
Index(es):
- Date
- Thread

Re: Gag me with a blunt &#x85;

Re: Gag me with a blunt