OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Closing Blueberry

At 9:10 AM +0900 7/20/01, Murata Makoto wrote:

>As for the Japanese language, I believe that I have demonstrated
>reasons: changes of unification and made-in-Japan Kanji require
>non-BMP name characters.  If Unicode becomes popular and we 
>continue to use XML 1.0, disallowed CJK ideographics will become 

I'm starting to realize there may be a deeper issue here. Languages evolve. It's the nature of the things.There are dozens of new words every year, some to describe new technologies like fax machines and e-mail, some that get adapted from other languages (glasnost in English, le weekend in French), others that just arise. ("Doh" just made it into the Oxford English Dictionary.) I'd be surprised if Japanese and Chinese are any different in this respect. 

In alphabetic languages like English and Russian, new words are no big deal. They fit right into Unicode and XML with no hassle. But what happens in ideographic languages? I know Japanese uses Katakana for some of these words. Is it all of them? How many new ideographs come into use each year? And in Chinese? I suspect it's even worse, but I would appreciate hearing from the Chinese speakers on the list.

For the sake of argument, say we had perfect knowledge and could fix XML and Unicode so that it did cover all current ideographs used today in Chinese and Japanese, what do we next year? and the year after that? and the one after that? and every year for the next ten thousand years?

Unicode's answer is that someone fills out the right forms, proves that the characters are being used, and then they're added. There's plenty enough space in Unicode to handle several hundred new characters a year for the next ten thousand years. As I think Simon originally suggested we could just tie XML to Unicode and leave it at that.

However, any fixed solution along the lines of XML 1.0 is guaranteed to fail, especially if the criterion for success is that all characters anyone wants to use but be available for use in XML names. We and our descendants will be revisiting these arguments every five years for the next few millennia. 

| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
|          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
|              http://www.ibiblio.org/xml/books/bible2/              |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      | 
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |