[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Gag me with a blunt 

From: Edwin.Fine@CommerceQuest.com
To: xml-dev@lists.xml.org
Date: Fri, 16 Mar 2001 12:19:37 -0500

All,

Maybe I am way off base here, but my experience with XML on OS/390 has shown that there are actually two EBCDIC "newline" characters: one is 0x15 (EBCDIC NEWLINE), and the other is 0x25 (EBCDIC LF or linefeed char). The IBM C++ compiler outputs 0x15 when you print "\n", but each record of mainframe text datasets appears to be terminated with 0x25. I have never seen 0x85 (incidentally, our mainframes use CP500 - international EBCDIC -- and CP37 -- US EBCDIC).

It gets worse, because there seems to be no agreement about the EBCDIC codes used for '[' and ']' characters. Some installations use 0xAD/0xBD, and some use 0xBA/0xBB respectively. This has caused us no end of trouble when trying to parse CDATA sections using an expat parser compiled natively (i.e. as an EBCDIC application) on OS/390. We finally settled on doing binary transfers on FTP and turning off translations on MQSeries, and performing our own translation to well-known values before parsing.

I would be interested to know if anyone else has had these experiences. The bottom line seems to be that XML was designed for the ASCII/UNICODE world and does not fit in very well in the EBCDIC mainframe world.

Regards,

Edwin Fine
CommerceQuest, Inc.
(Direct) 813-639-6508
(Fax) 813-639-6900
(Main) 813-639-6300
Edwin.Fine@CommerceQuest.com

Rob Lugt <roblugt@elcel.com>

03/16/2001 09:07 AM

To: Tim Bray <tbray@textuality.com>, James Clark <jjc@jclark.com>, xml-dev@lists.xml.org
cc:
Subject: Re: Gag me with a blunt 

James Clark wrote:
> >I'm not convinced. The XML spec says that Unicode character #x85 is not
> >a whitespace characters. It appears from the Note that EBCDIC text
> >files on IBM mainframes represent newline by a byte with code 0x85. The
> >solution appears obvious to me: the EBCDIC encoding table used by the
> >XML parser should map byte 0x85 to Unicode character 0xA.

The note from IBM is arguing that the XML spec is wrong by not designating
#x85 as white space. So the wording of the XML spec here doesn't seem like
a good reason not to be convinced. The fix to the EBCDIC coding table may
work but it appears to me that this is something of a hack because the
original software is intending to create Unicode U0085 characters. I would
prefer for XML parsers to be able to use standard encoding tables - perhaps
from generic libraries rather than having to create a special XML flavour.

Tim Bray replied:

> This feels much better. And upon reflection, the thought of
> XML files which have been through a mainframe starting to
> percolate around the system with U+0085 embedded inside
> start tags makes me nervous;
<snip/>
> Also, unlike (almost?) all the other XML errata, changing this
> would actively break pretty well every deployed piece of XML
> software in the world. -Tim

Arguably every deployed piece of XML software is already broken wrt files
containing U+0085. It appears to me that you (the editors) went to great
lengths to adopt the full Unicode specification rather than creating an XML
subset. If this was an oversight then I believe it makes sense to make good
the mistake and maintain full support for Unicode.

Regards
Rob Lugt
ElCel Technology

------------------------------------------------------------------
The xml-dev list is sponsored by XML.org, an initiative of OASIS
<http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To unsubscribe from this elist send a message with the single word
"unsubscribe" in the body to: xml-dev-request@lists.xml.org

Follow-Ups:
- Re: Gag me with a blunt 
  - From: Richard Tobin <richard@cogsci.ed.ac.uk>
- Re: Gag me with a blunt 
  - From: John Cowan <cowan@mercury.ccil.org>

Prev by Date: RE: A simple guy with a simple problem
Next by Date: Re: Gag me with a blunt 
Previous by thread: RE: Gag me with a blunt 
Next by thread: Re: Gag me with a blunt 
Index(es):
- Date
- Thread

Re: Gag me with a blunt &#x85;

Re: Gag me with a blunt