OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML Blueberry (non-ASCII name characters in Japan)



At 3:17 PM +0900 7/9/01, Joel Rees wrote:
>I'm not a guru, but I agree with Mr. Murata's post. UNICODE still needs to
>evolve, and XML must also evolve.
>

I think this is an incorrect presumption and is corrupting the discussion. The presumption must be that XML should not change. It is incumbent on those who wish it to change to produce good and solid reasons why it needs to change. Absent compelling reasons, we should reject changes. Simply assuming that XML must evolve and that therefore we might as well make these minor fixes that will break compatibility is not acceptable. XML was specifically designed to be stable on the order of thousands of years. There are good reasons to want it to be. It is disturbing to see that goal so casually discarded before the debate even begins.

So far, despite the hundreds of emails on the subject, no one has produced a simple list of the things that we gain by making these changes. What words can be used that are not now used that people would actually need to use in markup? (Remember, all the words in question can be used in text content today with no changes to XML.) How important and common are these words? In which user communities? For instance, I'm not willing to break compatibility for Deseret or Tengwar. Of the scripts and languages in question, the only one that gives me pause is Ethiopic because that's the only one that has a large user community that is not yet adequately (though perhaps imperfectly) addressed. 

>Concerning the additions of UNICODE 3.1, maybe I can repeat an example?
>
>Suppose that attributes such as "mellifluous" or "mellyfluous", or tags such
>as <fluere> were arbitrarily rejected by your parser. (Yeah, I admit, I had
>to dig around in my thesaurus and on www.m-w.com for about five minutes to
>find these.) Suppose that, with UNICODE 3.0, "mellifluous" were accepted,
>but not "mellyfluous" or "fluere". And then, with UNICODE 3.1, "mellyfluous"
>and "fluere" are potentially acceptable (but not "mellifluus", and
>definitely not "mellyfluus"). This is similar to the situation with the CJKV
>characters.
>

I'd be willing to give up those words in markup to maintain compatibility. In fact, I don't even recognize two of them. (Is mellyfluous a variant spelling of mellifluous?) Of course I don't have to because English uses an alphabet. And Japanese speakers don't have to give up their equally obscure words because they can use Katakana or Hiragana to write them. (Question for the Japanese experts: are there any words that cannot be written in Katakana or Hiragana? In normal communication would an obscure word be more likely to be written in Han or Hiragana or Katakana?) Chinese speakers can use Bopomofo, far less natural than Katakana or Hiragana for Japanese but at least a plausible work-around for the occasional obscure word not already encoded. 
-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+ 
|          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
|              http://www.ibiblio.org/xml/books/bible2/              |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      | 
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |
+----------------------------------+---------------------------------+