Lists Home |
Date Index |
Eric van der Vlist wrote:
> On ven, 2005-05-13 at 11:52 +0100, Michael Kay wrote:
>>With all these things, I think one has to ask what is the approach that
>>causes the least amount of pain to the average user. Asking everyone to
>>change a namespace URI so that a few users can identify clearly whether or
>>not their patterns are intended to match Ethiopian letters isn't a net win
> Only those whose pattern are intended to match Ethiopian letters would
> have to change the namespace URIs and that should reduce the number of
> such users by several orders of magnitude !
I beg to differ Eric, when I use a string or a sequence of name
characters I want it to be just a damn string and the last thing I want
to think about is whether it will be usable in Ethiopian, Myanmar,
Khmer, or Mongolian. I don't want the users of my
specification/schema/tool to have to figure out for themselves (or to
ask me) whether they can use the Katakana middle dot in Japanese element
names or not. A string, a name character, a white space character within
an electronic document MUST be recognized as such according to the
current state of the art. It MUST be able to be whatever the latest
version of Unicode says it is.
Of all people *we* should know that the encoding of text on a global
scale is not a static science, it evolves and needs to evolve as Unicode
improves. Yes this implies a phase during which XML processors may lose
some interoperability, but whoever puts XML interoperability above human
language operability needs to have their priorities seriously revised.
Yes this may break software that is making stupid assumptions about the
content of certain tokens, but such software was written based on a
misunderstanding of text and deserves to break (and then to be shot in
the kneecaps, tied to a horse and dragged all around town, dipped in
boiling lead, dismembered piece by piece with a rusty spoon, and finally
dumped in a ditch to agonize).
XML is about text dammit, and text is meant to encode something very
much alive called languages. It will change and it will move, under the
effect of both language evolution and of the progress made by the
Unicode Consortium in encoding more and more of it -- a task of
gargantuan proportion comparable to the attempts at mathesis that all
had given up on.
Anyone expecting it to be different is still living in a legacy US-ASCII
world that just happens to have a larger set of characters.
How can XML be the universal data format without the ability to handle
universal text? Heck, it's SGML for the *WORLD WIDE* Web we're talking
about, not a falsely ubiquitous data interchange format for big American