Lists Home |
Date Index |
> > It's good to remember that one reason they're a problem is that they
> > have become a storehouse for vendor-proprietary characters, with
> > as many different meanings as most C0 ones (U+0000..U+001F).
> > Blessing one vendor's solution may magnify the problems.
> Again, not really. The *bytes* 80 through 9F encode many different
> characters, including #x80 through #x9F. There is really only one
> set of uses in practice for the characters #x80 through #x9F.
Not if you go by what most systems do with those codes; what I've
seen in practice is that those codes will map to U+0080..U+009F.
Do you have some specification in mind for that "one set of uses"?
Some ISO-8859-1 spec addendum would be interesting, since
that's where those were defined (prior to importing to Unicode).
The rule of thumb being that one adds a high order zero byte to
the ISO-8859-1 code points ("bytes") and gets Unicode.