OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Inputting eastern ideographs (was Re: XML Blueberry, etc.)

Thomas B. Passin inquired:

> So, you CJK-obscure-coding unicode experts out there, what's the betting
> how the characters will get into people's text-producing programs?  WIll
> people be typing these new characters into documents with abandon?

You just picked at a real burning issue here. At the risk of continuing a
side-topic, I'll describe the tip of the iceburg. (Burning icebergs? sorry
about that ;-)

Entering Kanji used to be similar to typing character codes in using the alt
key and the ten key pad. You had a (physical) book of about three thousand
characters, and you had to look up the number to type in.

It's a bit easier now, but it is _not_ just like typing English. You type
out the pronunciation, then you hit the convert key (conveniently, the space
bar on most PCs now), then you have to sort through a list of characters
that match the pronunciation. They use some AI so you can type in a whole
sentence and have more than 40% chance of the right characters being the
defaults shown in the lists. Good input method software reduces the amount
of hunting, but you still have to visually confirm. (Professional
secretaries get a third party IME called ATOK, and they have that one so
well memorized that they only have to look at the screen occasionally.)

The kana are assigned keys on the keyboard, so you can choose between typing
the pronunciation using kana or using romanizations. For various reasons,
like numerals being not directly available when using the kana keyboard,
people who use keyboards in their work tend to use the romanizations. This
description, of course, does not do the process justice.

The internal tables for the pronunciation methods usually support only the
three thousand or so most common characters. You have to use an alternate
method for the rest. There are two alternate methods: (1) list all
characters in order on the screen for the user to hunt and peck and peck
from, or (2) let the user write the character with the mouse and then use
character recognition software to put a short list on the screen to hunt and
peck from.

You can get used to it, but Japanese people tend to be significantly less
comfortable with keyboards than Americans.

I have not seen the Chinese methods, but I have heard that the most popular
methods there assign a series of fundamental component characters to keys on
the keyboard. You type in a list of the components of the character, and you
get a list of possible matches on the screen. (Sort of reminds one of
auto-completion in the URL blank on web browsers.)

There are about two to three hundred recognized component characters,
<english-academic>radicals</english-academic>, listed in the various
dictionaries, but they only use a subset of those on the keyboard. Most of
the radicals on the keyboard consist of just one or two strokes, but
<assumption>essentially all the characters</assumption> can be built from
this subset.

I have heard that you can actually buy software to input Japanese by the
radical method popular in China, and, conversely, to enter Chinese by a
pronunciation method similar to the method popular in Japan.

The block added to UNICODE 3.1 is not yet available without special
software. (And then it is not yet available by the UNICODE codes. New stuff,
you know.)

Most of the characters of the extension will (eventually) be pushed into the
group of characters accessed by the auxilliary methods. But some of the ones
that were mistakenly unified in UNICODE should actually be available by the
main pronunciation method.

I am guessing, because the method used in China is based on radicals from
the outset, that input for the rare characters will be a seemless extension
of the main method.

Adding all these characters to the system fonts is a separate issue.


Joel Rees
programmer -- rees@mediafusion.co.jp
To be a tree supporting all information,
  giving root to the chaos
    and branches to the trivia,
      information breathing anew --
        This is the aim of Yggdrasill.
============================XML as Best Solution===
Media Fusion Co. ,Ltd.  株式会社メディアフュージョン
Amagasaki  TEL 81-6-6415-2560    FAX 81-6-6415-2556
    Tokyo TEL 81-3-3516-2566   FAX 81-3-3516-2567