OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   [offtopic] Re: [xml-dev] Microsoft FUD on binary XML...

[ Lists Home | Date Index | Thread Index ]

Tim Bray wrote:

> On Nov 22, 2003, at 3:37 PM, Alaric B Snell wrote:
>> Good point, actually... I suppose that, in general, any language 
>> which uses more than 256 code points in general use is actually quite 
>> likely to be a language that uses one code point per word.
> No, actually.  I don't know much about Chinese, but the average number 
> of characters/word in Japanese is two point something; you have to 
> learn 1700 or so characters to get out of Japanese high school, and 
> literate people pick up quite a few more.  Korean Hangul are syllabics 
> and thus there are naturally several per word. 

Chinese words are often deemed to be made of two characters: Beijing. 
The very common
4 character parallel epigram (such as "crouching tiger hidden dragon") 
uses this.

On the other hand, one remarkable thing about Chinese is that lay people
often do not have a strong idea of "word" at all. Not one of my various
Chinese friends could even name, off hand, the Chinese word for "word".
De Francis' "The Chinese Language" says they go from characters to ideas
to sentences rather than letters to words to ideas to sentences, where a
character is halfway between our letters and a word. I guess it is like
in English: is "white space" or "whitespace" one word or two words?

Rick Jelliffe


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS