[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: XML Blueberry
- From: "K. Ari Krupnikov" <ari@cogsci.ed.ac.uk>
- To: xml-dev@lists.xml.org
- Date: Fri, 22 Jun 2001 15:30:18 +0100
Tim Bray wrote:
>
> So I think it would be appropriate, in this discussion,
> to have some people in the mainframe trenches give us
> a briefing on the scale and the difficulty of the problems
> they face, and for some of our i18n gurus to highlight
> the problems faced by an XML language designer who wants
> to use one of the newly-added languages.
CR, LF and NEL are not the only space characters in Unicode.
I can't say I'm an i18n expert, and it's been a while since I've touched
a mainframe terminal, but when I did the software I wrote spoke Hebrew
to its users.
Now Hebrew is written right to left. Of course, Latin characters or
digits may be found in Hebrew documents, and are written left to right.
There is an elaborate algorithm to determine whether a particular
character should go to the right or left of the preceding one, the so
called bi-di algorithm. But there are cases where this algorithm is non
deterministic, and so special characters were introduced in Unicode --
right-to-left space and left-to-right space.
Why not add these two to the S production for the sake of Hebrew and
Arabic users? There's no end to what can be regarded as whitespace.
Ari.