OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Processing XML 1.1 documents with XML Schema 1.0 processor

[ Lists Home | Date Index | Thread Index ]

i'll keep reading this thread, but here's a few basic points.

i think the basic problem is ascii. it's a blot on the programming 
landscape. anyone who has worked in some sort of markup for a long time 
(not xml, but markup nevertheless in my case)  can see the damage that 
ascii does. and  as a corollary, the standard 'c' library that assumes 8 
bit chars.  it's simply time to scrap the standard c library's 
dependence on ascii, relegate it to the dustbin of history where it 
belongs and settle on unicode. if dec (or ibm) had used 16 bit 
characters instead of 8 bit we wouldn't even have ascii. after all what 
does the 'a' stand for... hint it doesn't stand for australian, 
austrian, angolan, or argentinian (etc).

and the 8 bit/16 bit storage thing doesn't work for me. i want data 
interchange and internationalisation.

on a very practical note. observing chinese people working for one of my 
clients (in china) i note the following - they can read and write faster 
in english, but are so uncomfortable they prefer to work in chinese. 
anecdotally it seems that english and ascii is in general efficient 
(i've experienced this more than a few times) but it's also 'foreign' 
and uncomfortable.

personally i think it's time to dump ascii, and get real and base 
everything - especially the standard c library - around unicode.



Henri Sivonen wrote:

> Resending. Previous attempt bounced.
> On May 14, 2005, at 16:13, Henri Sivonen wrote:
>> their non-terminals as in SGML
> s/non-//
> On May 15, 2005, at 02:16, Rick Jelliffe wrote:
>>> On May 14, 2005, at 14:15, Rick Jelliffe wrote:
>>> To assess whether this rationale for XML 1.1 make practical sense, it
>>> would seem natural to observe whether people are actually using to
>>> non-ASCII possibilities of XML 1.0. If research shows that the
>>> non-ASCII possibilities provided by even XML 1.0 are not actually used
>>> to a significant extent, why bother with breaking interoperability by
>>> extending the non-ASCII features?
>> Because XML set the agenda: it has lead, not followed.
> So is everyone expected to upgrade because of an agenda of principle?
>> XML went Unicode, so Perl followed, for example.
> Yes, but in my world view, the major feature that made XML the Trojan 
> horse of Unicode (in a good way) was the Unicode content and the 
> requirement to support UTF-8 - not as much the tags.
>> XML 1.1 does not indeed solve enough issues
>> that are relevant to most people to get adopted by grassroots demand: it
>> will gradually creep through the infrastructure and be available for the
>> people who need it (or, at least, other standards will be amended to 
>> cope
>> with the 1.1 changes, and therefore be ready for XML 2.0 when it 
>> comes out
>> in 2010.)
>> I believe Murata-san found a very high percentage of Japanese XML
>> documents on the web used non-ASCII in markup. (Was it 60%?)
> Point about Japanese taken. (I'm still curious what markup languages 
> are used and for what client software. 60% *on the Web* seems 
> strangely high given that Web browsers and RSS readers use ASCII 
> vocabularies.)
> I still doubt that a Cambodian software developer given these choices
>  1. XML 1.0, non-Khmer tags, Khmer content, supported by all the XML
>     tools out there
>  2. XML 1.1, Khmer tags, Khmer content, trouble with tools that don't
>     support XML 1.1
> would choose #2. And if (s)he would choose #1, what's the point of 
> going through the trouble of enabling #2 now that XML 1.0 is already 
> out there and can't be fixed retroactively?
> Let me give a comparable example that doesn't have the i18n stigma:
> From time to time someone posts to www-style saying how cool it would 
> be if CSS had built-in support for defining named constants. Given the 
> usual behavior of Web developers, I assume that if that feature was 
> added now, developers would not use it, because the feature would not 
> enable any new visual effects but would break the style sheets in 
> existing browsers.
> I could be wrong, of course, if Cambodian developers have a 
> significantly different attitude than Finnish developers. Over here, 
> the experience that umlauts are trouble runs so deep that when someone 
> says that you can use umlauts in XML element names or Java variable 
> names, people just shrug thinking it will break anyway and go on doing 
> what is safe for sure and what makes your source shareable with 
> foreigners. (And sure enough, if you look closely, it is possible to 
> come up with a scenario where umlauts in XML element names cause 
> breakage.)
>>> Writing Finnish and programming punctuation (;{}[]<>/\=) at the same
>>> time is inconvenient given the usual input methods. I'd imagine the
>>> inconvenience with non-Latin writing to be even greater.
>> One of the most popular keyboard input methods in China and Japan is to
>> type a two-letter syllable, then select from the MRU list that popsup.
>> This only uses alphabetics for data entry. It sounds like Finnish people
>> doing markup are not well served by current keyboards or keyboard
>> mappings.
> Chinese and Japanese are special in the sense that their input methods 
> often involve the U.S. qwerty layout plus lookups.
> For alphabetic languages the situation is different. A very quick 
> survey of OS X kb layouts reveals that Latin, Cyrillic, Greek, Hebrew, 
> Arabic and Indic layouts (but not Thai it seems) make these characters 
> available but in a less ergonomic way than the U.S. qwerty (not to 
> mention Dvorak). So Finnish is not so different. (Sure, a lot of 
> people in Finland use the Finnish layout for markup and programming, 
> but the ergonomics are still bad.)

fn:Rick  Marshall
tel;cell:+61 411 287 530


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS