Re: XML Blueberry

On Fri, Jun 22, 2001 at 10:17:28AM +0100, David Carlisle wrote:
>] The major problem is that when
>] it appears in a tag, e.g. 
>] <t a1="1"
>]    a2="2"> 
>] (where there's a NEL after the "1") then the XML processor will kick
>] this out.  -Tim
>Do any files really use NEL that are encoded in utf-8 or utf-16 (or
>utf-8 subsets like ascii that don't need to be declared)?
>If all the files using NEL start 
><?xml version="1.0" encoding="some-flavour-of-ebcdic"?>
>Then can't NEL be mapped to #10 (0r #13) in the non normative support
>for the ebcdic related encodings. This wouldn't require any change to XML.

At a guess, this is a new software problem rather than an old software
problem.  Remember that IBM is a *big* advocate of Java, across
platforms.  There's an outstanding chance that System.out.println(),
System.getProperty("line.separator") supply NEL, a perfectly valid (and
until now, probably) uncontroversial choice for line ending.  It isn't
the NVT line-ending, but Unix broke that first, and Apple broke it
differently, for much the same reason that IBM settled on NEL--why use
two characters to represent one thing?  (the network virtual terminal
uses CR/LF for backward compatibility with old teletypes, after all ...
all of these control characters are a pain).

Time for the shocker.  Why not just remove the concept of line endings
from XML?  Whitespace == whitespace (per unicode definition, and let
something like Java's Character.isWhitespace() actually work).  Rather
than focussing exclusively on the IBM choice of NEL ... hmmm.  Well,
no, I suppose not.  CR, LF, NEL, TAB aren't space characters, per
unicode; might be able to do it by defining the S production to be
Unicode space + Unicode layout control, although that may be a slightly
wider net to cast.

Note that I'm not a mainframe person, so I'm only guessing that the NEL
issue is new-software-related.  Seems reasonable, all things

Amelia A. Lewis          alicorn@mindspring.com          amyzing@talsever.com