OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] [About Unicode] Why the symbol LOGICAL NOT ismissing from

[ Lists Home | Date Index | Thread Index ]

William J. Kammerer said:

>I can't comment on the usability of any alphabet other than Latin, but
>is it "fair" that Chinese ideograms chew up tens of thousands of code
>points in Unicode? All the while Latin only needs a few dozen even when
>you throw in the accents and umlauts?

Is it fair that we got first crack (the base) at the Unicode 
database? Is it fair that the rest of the world has to deal with all 
the ASCII characters that we English speaking countries have imposed 
on the Internet? Is it fair that we got all the "best" (keyboard 
wise) dot com names? Is it fair that we were so shortsighted that we 
used a seven-bit word instead of an eight-bit to resolve characters 
-- and now, as a result everyone must use nameprep and punnycode to 
translate all those tens of thousands of Chinese ideograms to ASCII 
for resolution?

Keep in mind,  that here are 260 TLDs. There are 7260 languages, some 
of them having 2 or even 3 scripts. There are around 13000 dialects 
of some importance (a language needs 100.000 people speaking it to 
survive). E-colonization (dominance of an e-culture) should probably 
lead to the initial deprecation of some languages, but recent history 
shows a cultural resistance and resurgence after such a shock. So one 
can consider that Internet will most probably help languages to 
survive and develop: a 50,000 people minim might be a good rule of 
thumb (think of trade, community idioms).

So roughly one can consider that 50,000 languages with possible 260 
variants (at TLD level) are to be considered. Obviously most of them 
will try to use the same script as much as they can for the TLDs. But 
this cannot be considered as systematic all throughout a language. So 
one has to consider 10 millions possibilities most of them synonyms 
or not implemented. I am just talking of the legacy: PADs may 
introduce 10 times this.

Now, considering all of this, when we (Latin users) first started the 
Internet, we thought that seven bits could do it all -- what fairness 
in the world was that?

No, I think the rest of the world got the fuzzy end of this lolly-pop.



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS