OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Gag me with a blunt …

Rick Jelliffe scripsit:

> There is another relevant question: what is the use of the characters that
> has a different semantic than XML's #xA?

<fact>They reflect the fact that there are three separate line-ending control
characters, CR, LF, and NEL.  The first two were assigned to the C0
set in ASCII-based charsets, the last to the C1 set (0x80-0x9F).
(In EBCDIC there is only one group of control characters, 0x00-0x3F.)

Various systems have made various choices of line-end character.
The DEC operating systems, CP/M, MS-DOS, and Windows use CR plus LF.
Unix chose to use LF alone.  The Mac uses CR alone.  And OS/360
and its descendant operating systems use NEL alone.  The first three
choices are accommodated by XML, being converted to LF for convenience
in processing, except when appearing explicitly as character refs.

<opinion>The fourth choice should be too.  So should the Unicode
specific choices, U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR.

> In other words, are NEL like TABs: archaisms that don't fit well with XML
> (or SGML in the absense of short-references).   

Not unless line-ends in general are to be considered archaisms.

John Cowan                                   cowan@ccil.org
One art/there is/no less/no more/All things/to do/with sparks/galore
	--Douglas Hofstadter