OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML Blueberry

At 5:16 PM -0400 6/21/01, John Cowan wrote:

>But in full generality that means each document has to bear the
>version number of Unicode that it assumes (for its names only, of
>of course, not for its character content).  That means a series of
>updates to parsers are required, and control of the schedule is
>lost from W3C to Unicode.  Is this a Good Thing or a Bad Thing?

That implies that it's one or the other. Clearly it's both and what's 
needed is a rational comparison of the real advantages and 
disadvantages of both approaches. How big are the benefits that are 
gained by this? How big are the disadvantages? Clearly both exist.

I don't think the potential benefits outweigh the disadvantages. 
Splitting XML into multiple incompatible implementations strikes me 
as a very bad thing. And make no mistake, this is just the first 
step, not the last. Unicode's got at least two more major iterations 
left in it that will force changes in XML parsers if we tie XML that 
closely to Unicode. It's not just blueberry but raspberry and 
blackberry too, and maybe other flavors!

What do we get in exchange for imposing major costs of transitions on 
developers around the world? Tags can now be written in a few extra 
scripts. And note that I say scripts, not languages.  Many of the 
languages listed in the reqs have well-established traditions in 
other scripts as well.  For instance, Mongolian can be written in 
Cyrillic. (Blame the Soviet Union for that, but it is a plausible 
workaround for tag names in Mongolian.)

The only scripts that seem to really call out for native markup that 
don't have it already are Khmer (a.k.a. Cambodian) and Amharic 
(Ethiopic languages). I can at least imagine someone wanting to mark 
up some text in one of those scripts while not being able to do so in 
a Latin, Cyrillic, or Chinese-based script. However, I can't promise 
you that any such person actually exists. Before I agree to the 
forced obsolescence of all existing XML systems and software, I want 
to know that there is a real need for this now, not just that it 
would be a cool thing to have if somebody wants it someday.

As a demonstration, I'd want to see at an absolute minimum that it 
was possible to use a computer in such a language (e.g. Amharic, 
Tigre, Khmer) without also having some competence in a more prevalent 
script like Latin or Cyrillic. I'd also want it demonstrated that 
this was done via a different character encoding, and not merely by a 
font mapping to some ASCII superset. (This is how the limited 
Ethiopic software I've actually seen has all worked.)

If you can show  that, then we can reasonably compare the costs and 
benefits of this proposal. But if you can't show that, then you're 
asking to impose very real costs on XML developers around the world 
for what may well prove to be an illusory benefit.

| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
|                  The XML Bible (IDG Books, 1999)                   |
|              http://metalab.unc.edu/xml/books/bible/               |
|   http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/   |
|  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/     |