OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Processing XML 1.1 documents with XML Schema 1.0 processor

[ Lists Home | Date Index | Thread Index ]

Resending. Previous attempt bounced.

On May 14, 2005, at 16:13, Henri Sivonen wrote:

> their non-terminals as in SGML


On May 15, 2005, at 02:16, Rick Jelliffe wrote:

>> On May 14, 2005, at 14:15, Rick Jelliffe wrote:
>> To assess whether this rationale for XML 1.1 make practical sense, it
>> would seem natural to observe whether people are actually using to
>> non-ASCII possibilities of XML 1.0. If research shows that the
>> non-ASCII possibilities provided by even XML 1.0 are not actually used
>> to a significant extent, why bother with breaking interoperability by
>> extending the non-ASCII features?
> Because XML set the agenda: it has lead, not followed.

So is everyone expected to upgrade because of an agenda of principle?

> XML went Unicode, so Perl followed, for example.

Yes, but in my world view, the major feature that made XML the Trojan 
horse of Unicode (in a good way) was the Unicode content and the 
requirement to support UTF-8 - not as much the tags.

> XML 1.1 does not indeed solve enough issues
> that are relevant to most people to get adopted by grassroots demand: 
> it
> will gradually creep through the infrastructure and be available for 
> the
> people who need it (or, at least, other standards will be amended to 
> cope
> with the 1.1 changes, and therefore be ready for XML 2.0 when it comes 
> out
> in 2010.)
> I believe Murata-san found a very high percentage of Japanese XML
> documents on the web used non-ASCII in markup. (Was it 60%?)

Point about Japanese taken. (I'm still curious what markup languages 
are used and for what client software. 60% *on the Web* seems strangely 
high given that Web browsers and RSS readers use ASCII vocabularies.)

I still doubt that a Cambodian software developer given these choices
  1. XML 1.0, non-Khmer tags, Khmer content, supported by all the XML
     tools out there
  2. XML 1.1, Khmer tags, Khmer content, trouble with tools that don't
     support XML 1.1
would choose #2. And if (s)he would choose #1, what's the point of 
going through the trouble of enabling #2 now that XML 1.0 is already 
out there and can't be fixed retroactively?

Let me give a comparable example that doesn't have the i18n stigma:
 From time to time someone posts to www-style saying how cool it would 
be if CSS had built-in support for defining named constants. Given the 
usual behavior of Web developers, I assume that if that feature was 
added now, developers would not use it, because the feature would not 
enable any new visual effects but would break the style sheets in 
existing browsers.

I could be wrong, of course, if Cambodian developers have a 
significantly different attitude than Finnish developers. Over here, 
the experience that umlauts are trouble runs so deep that when someone 
says that you can use umlauts in XML element names or Java variable 
names, people just shrug thinking it will break anyway and go on doing 
what is safe for sure and what makes your source shareable with 
foreigners. (And sure enough, if you look closely, it is possible to 
come up with a scenario where umlauts in XML element names cause 

>> Writing Finnish and programming punctuation (;{}[]<>/\=) at the same
>> time is inconvenient given the usual input methods. I'd imagine the
>> inconvenience with non-Latin writing to be even greater.
> One of the most popular keyboard input methods in China and Japan is to
> type a two-letter syllable, then select from the MRU list that popsup.
> This only uses alphabetics for data entry. It sounds like Finnish 
> people
> doing markup are not well served by current keyboards or keyboard
> mappings.

Chinese and Japanese are special in the sense that their input methods 
often involve the U.S. qwerty layout plus lookups.

For alphabetic languages the situation is different. A very quick 
survey of OS X kb layouts reveals that Latin, Cyrillic, Greek, Hebrew, 
Arabic and Indic layouts (but not Thai it seems) make these characters 
available but in a less ergonomic way than the U.S. qwerty (not to 
mention Dvorak). So Finnish is not so different. (Sure, a lot of people 
in Finland use the Finnish layout for markup and programming, but the 
ergonomics are still bad.)

Henri Sivonen


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS