OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML referencing (was Re: Blueberry is not "closed")



What a difference a version makes -- CR and LF (NL) are listed in the
Unicode 3.0 book.

More information on line endings in (Unicode 3.0 p. 314) section 13.1, "C0
Control Codes", also, (p. 316) section 13.2"Layout Controls".

But the best and most up-to-date information on line endings is on the web:

http://www.unicode.org/unicode/reports/tr13/tr13-8.html

(as I noted in a follow-up post that didn't get connected in the archives.
Reading mostly on-line is faster, but the posts don't end up connecting the
same when I don't use the return button.)

What it says there (rough paraphrase) is that line endings in text
(including NEL) should be generally converted to U-2028 on read and
converted to whatever the system expects (including NEL) on write.

Oh, and yes, since you mention it, line boundary control characters are not
strictly about line endings. (I saw the em space in the list, but it didn't
register the first time. :-()

Richard Tobin contemplated:


> >Okay, from Unicode 3.0, p. 48 (sec. 3.9, "Special Character Properties"),
I
> >see that NEL is _not_ in the UNICODE list of line boundary control
> >characters. This seems odd.
>
> I don't think "line boundary control character" means what you think it
> does.  In fact, I'm having trouble finding out what it means at all.
> A search of the Unicode site only lists it in an erratum.
>
> It appears (I'm looking at the Unicode 2.0 book) to mean "space
> character".  It's supposed to be described in chapter 6, but there
> the same characters are listed as space characters.
>
> In any case, it doesn't include NL or CR either.
>
> -- Richard
>