[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: XML Blueberry (non-ASCII name characters in Japan)
- From: Joel Rees <email@example.com>
- To: Rick Jelliffe <firstname.lastname@example.org>
- Date: Wed, 11 Jul 2001 13:13:52 +0900
Thanks for the virtual links, Rick.
600 fundamental components and 16 composition functions? That's not going to
help in developing an extensible character encoding that computers can use.
(Or, rather, that humans can use on computers.)
crud. (pardon me.)
I had hoped that the reduction of pin-yin to the keyboard was a good
indication that the 230 or so radicals I am familiar with from Japanese
could be further reduced to good effect. I know the ideographs have been
built with ad-hoc rules, and that systemizations from one era have been
overwritten by systemizations from the next, but I keep hoping.
Actually, I have some vague ideas about formalizing a dual encoding -- the
simple scalar encoding (thus, a single code point) would be used to
reference pre-composed/pre-rendered characters, but each character would
also have a standard vector encoding, a string of position-code:radical
pairs. To send a non-standard character with a document, it would be defined
in a document header in three parts: the vector encoding and an assignment
to an arbitrary scalar code drawn from a private use area, together with a
graphic description of the non-standard character as it should be composed.
And then we start mucking around with parsing problems, and it occurs to me
that we need a fourth part for the definition, a set of attributes for the
non-standard character, to tell the parser how to parse it. By this time I
get to feeling giddy, like I'm walking on a high-wire, and I give up. Well,
sometimes I get far enough to think about using
position:orientation:scaling:component 4-tuples, and to thinking that
several compositing schemes should be supported. And then my job keeps
calling me back.
Color me a confused idealist.
PS: I have some friends who insist that the Japanese had Kanji before it was
(re-?) introduced from China. They use some historical oddities to argue
that Kanji should be considered a separate and independent writing system
from the Han characters. Isn't it wonderful to live in a world with lots of
friendly holes to fall into?
----- Original Message -----
From: "Rick Jelliffe" <email@example.com>
Cc: "www-xml-blueberry-comments" <firstname.lastname@example.org>
Sent: Tuesday, July 10, 2001 9:17 PM
Subject: Re: XML Blueberry (non-ASCII name characters in Japan)
> In Unicode 3.1 there are added special function characters for allowing
> characters to be composed positionally from parts. These are intended for
> very rare or new characters only.
> There has been several thousand of years of research into what the
> components of Han ideographs are. It is only now that we have computers
> large databases of characters that it is feasible to try out different
> alternatives. At Academia Sinica, for example, my friend Prof. C.C. Hsieh
> devised a system with about 600 components and I think 16 composition
> functions (side-by-side) which can represent about 98% of the Hanyu
> Unicode went with a simpler set of functions, but at the expense that the
> functions allow some ambiguity: there may be more than one way to
> the same character. This may be fine for text, but not good for names
> normalization and comparison is their destiny.
> (I don't think these function characters are suitable for use in names,
> Rick Jelliffe