OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Processing XML 1.1 documents with XML Schema 1.0 processor

[ Lists Home | Date Index | Thread Index ]

Eric van der Vlist wrote:
> On ven, 2005-05-13 at 11:52 +0100, Michael Kay wrote:
>>With all these things, I think one has to ask what is the approach that
>>causes the least amount of pain to the average user. Asking everyone to
>>change a namespace URI so that a few users can identify clearly whether or
>>not their patterns are intended to match Ethiopian letters isn't a net win
> Only those whose pattern are intended to match Ethiopian letters would
> have to change the namespace URIs and that should reduce the number of
> such users by several orders of magnitude !

I beg to differ Eric, when I use a string or a sequence of name 
characters I want it to be just a damn string and the last thing I want 
to think about is whether it will be usable in Ethiopian, Myanmar, 
Khmer, or Mongolian. I don't want the users of my 
specification/schema/tool to have to figure out for themselves (or to 
ask me) whether they can use the Katakana middle dot in Japanese element 
names or not. A string, a name character, a white space character within 
an electronic document MUST be recognized as such according to the 
current state of the art. It MUST be able to be whatever the latest 
version of Unicode says it is.

Of all people *we* should know that the encoding of text on a global 
scale is not a static science, it evolves and needs to evolve as Unicode 
improves. Yes this implies a phase during which XML processors may lose 
some interoperability, but whoever puts XML interoperability above human 
language operability needs to have their priorities seriously revised. 
Yes this may break software that is making stupid assumptions about the 
content of certain tokens, but such software was written based on a 
misunderstanding of text and deserves to break (and then to be shot in 
the kneecaps, tied to a horse and dragged all around town, dipped in 
boiling lead, dismembered piece by piece with a rusty spoon, and finally 
dumped in a ditch to agonize).

XML is about text dammit, and text is meant to encode something very 
much alive called languages. It will change and it will move, under the 
effect of both language evolution and of the progress made by the 
Unicode Consortium in encoding more and more of it -- a task of 
gargantuan proportion comparable to the attempts at mathesis that all 
had given up on.

Anyone expecting it to be different is still living in a legacy US-ASCII 
world that just happens to have a larger set of characters.

How can XML be the universal data format without the ability to handle 
universal text? Heck, it's SGML for the *WORLD WIDE* Web we're talking 
about, not a falsely ubiquitous data interchange format for big American 

Robin Berjon
   Research Scientist
   Expway, http://expway.com/


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS