Re: Gag me with a blunt …

HUGHES,MARK (Non-HP-FtCollins,ex1) wrote:

>> From: John Cowan [mailto:cowan@mercury.ccil.org]
>> Absolutely right.  The ASCII/Unicode analogue of 0x15 is 0x0a (LINE
>> FEED), and the ASCII/Unicode analogue of 0x25 is 0x85 (NEW LINE).  So
>> when there is an 0x25 in EBCDIC data, it is correctly converted to
>> 0x85.
>   So, why did they choose not to use 0x0D (CR) for 0x25/0x85, since
> that's the semantically-closest character?  Do they also have a
> CR-equivalent character that isn't being mentioned here,

Yes.  EBCDIC has distinct CR, LF, and NEL characters.  But so does
extended ASCII -- the 0x80-0x9F control characters were pre-existing
and not introduced by Unicode.

In any event, it is not clear that CR is closer to NEL than LF is.
CR, LF, CR+LF, and NEL are all used by different environments.

> and is useful
> information lost by converting 0x25 to 0x0D?

Roundtrippability.  EBCDIC CR corresponds to ASCII CR, and ditto for
LF and NEL.  But the *conventions of use* of these characters differ
among the various systems.

>   No matter how big they are, one company's platform-specific problems
> should not be used to drive the rest of the industry.  I should think
> that would be self-evident.

Why should Apple's CR-only line ending, which is unique to the Mac,
be accepted (as it is), and IBM's NEL-only line ending, which is
shared by various big-iron systems, not be?
>   Now, if IBM wants to submit NEL and other Unicode 3.0 whitespace
> support as a change for XML 1.1 or further, more power to 'em.  But
> changing XML 1.0 for their vanity is not a Thing Which Should Happen.

The proposed change would create 1.0.1, and it is not a matter of
vanity, but interoperability.

