OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Blueberry/Unicode/XML



Boy, this one's tough.  I buy neither Elliote's assertion that
changing XML is unthinkable, nor John Cowan's assertion that the
depth of the cultural affront to users of pre-Unicode-3.1 
languages is so high as to outweigh consideration of cost.

I just went and reviewed the Blueberry requirements at
http://www.w3.org/TR/xml-blueberry-req and I'm not very comfy
with them.  There is repeated and specific reference to the
problem being that posed by Unicode 3.1.  The problem isn't
3.1, it's that Unicode is an unfinished standard that
continues to grow actively, whereas it would be nice if
we could declare XML syntax finished and go back to our
plows.

XML 1.0 took a design decision in favor of enumeration of 
name characters, simply because the alternative - outsourcing 
the problem to the Unicode/ISO10646 process - had two 
problems:

(a) We didn't know them well enough to trust them, and
(b) writing a satisfying set of rules for XML name chars
    based solely on Unicode metadata is pretty hard.

The force of argument (b) is unabated.  (a) seems less of
a worry now simply because the Unicode and XML gangs have 
gotten pretty comfy with each other.  But I do have a worry
at the back of my mind whether the W3C *institutionally* 
ought to trust the consortium *institutionally* with 
something of this magnitude.  And what happens of ISO and
Unicode stop getting along one of these centuries, whose
side is XML on?

A few weeks ago, I was in favor of leaving it the way it
is, but only by about 55-45.  I found the most convincing
argument on the other side was the person who postulated
a Khmer user typing away in emacs and having a disconnect
because there are lots of characters they can use for 
people's names but not as attribute names.  On the other
hand, this problem is not unique to Khmer - just ask 
Mr. O'Hara.

And the notion of having a single monolithic XML whose
interoperability, while not perfect, is pretty $#!%* good,
partially based on those unwieldy character-class 
productions, is something that it will hurt to lose.  And
it is a reasonable position to say "The markup name character 
class snapshot was based on Unicode 2.0, sorry 'bout that."

Realistically, there are 3 options:

1. Leave it the way it is.
2. Do Blueberry and then repeat the process for Unicode 3.2
   and 4.0 and so on every couple of years forever.
3. Bite the bullet, write the rules in terms of Unicode
   metadata and go to a pure use-by-reference architecture,
   probably adding a syntactic signal to reference the
   Unicode version number.

I think (3.) will prove to be really hard to do well - and 
then the Unicode metadata fields might get changed and screw
it all up.  I think (2.) is not unreasonable, but has the 
institutional disadvantage that the XML standardization effort 
has to become an ongoing process ad infinitum.  

I still go for (1.).  My opposition to NEL has hardened,
because of a strong fear that this one will cause real 
wreckage on a widespread basis, not just in linguistic
corner cases.

But I really can't see how anyone can get behind any of 
these positions and feel entirely comfortable with where
they find themselves standing.  I sure don't. -Tim