OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML Blueberry (non-ASCII name characters in Japan)

At 5:06 PM -0400 7/9/01, Simon "St.Laurent" wrote:
>On 09 Jul 2001 16:07:24 -0400, Elliotte Rusty Harold wrote:
>> At 2:52 PM -0400 7/9/01, John Cowan wrote:
>> >Please tell me what kind of argument you would find convincing.
>> >
>> Prove to me the existence of 10,000 or more users who want to write XML
>> *markup* in any combination of the scripts added in Unicode 3.0 and
>> 3.1, who cannot reasonably use an alternative script for their 
>> language of choice, and who do not read and write some better 
>> supported language. 
>Are those the criteria for Unicode?  

Of course not, and those shouldn't be the criteria for inclusion in Unicode because Unicode has totally different needs than XML markup. Unicode is about *TEXT*. We're talking about *MARKUP*. These points keep getting confused. To lose any of these scripts from text, would be a huge disadvantage. It would clearly disenfranchise far more than 10,000 users apiece. It would be a radical impoverishment of human culture. But nobody's arguing that.

One more time: every single one of the characters in question can be used in XML documents today. Want to publish a newspaper in Amharic using XML? No problem. Want to write poetry in Burmese? Not an issue. Want to take down oral history in Khmer? Go for it. 

If XML 1.0 prohibited, these characters from #PCDATA, then there'd be a much stronger argument for breaking compatibility. Indeed I probably would have suggested it myself quite a while ago, but it wasn't necessary then and it isn't necessary now. I remember when I first realized what the XML 1.0 BNF grammar did with the unassigned characters, and why it did that. Frankly I was shocked. I never would have thought of doing that. Fortunately the team that put together XML 1.0 did a much better job than I could have done, and a much better job than  you're giving them credit for now. XML 1.0 is fully adequate for any form of text in any of these languages, as well as a number yet to come. That's truly an amazing achievement. 

(FYI I have been advocating breaking backwards compatibility in Java for several months now over this issue because unlike XML, Java cannot use these characters in plain text.) 

>I don't think so, and I'm not
>really sure what cloud you picked 10,000 out of.  On the basis of 10,000
>people using markup, we can probably disenfranchise significant
>communities around the globe who use characters already recognized by
>XML 1.0.

The criterion isn't really 10,000. The criterion is enough users to justify the cost of transition. So far I've yet to see the existence of one such user demonstrated, much less 10,000. 

Again, if there were no additional cost to adding these characters, we wouldn't be having this conversation. But there is a cost, a real one, and one that's going to affect many people. For some people it will be a minor inconvenience. A few book authors like you and me may actually make money off this change. But a lot of non-experts are going to get hammered with unexpected incompatibilities they can't easily diagnose. 

>I'm deeply unconvinced that it's the job of the XML community to decide
>issues which seem far more likely to be understood by the Unicode

It's not the W3C's job to decide what characters should be allowed in text. It's not the Unicode Consortium's job to decide what characters should be allowed in  XML names. 


| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
|          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
|              http://www.ibiblio.org/xml/books/bible2/              |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      | 
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |