OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: XML Blueberry

At 11:21 AM -0400 6/21/01, Mike.Champion@SoftwareAG-USA.com wrote:

This is an area that I know even less about, but I have been burned 
repeatedly  by taking the same position as Elliot Rusty Harold does. 
As I understand it, the principal reason that people care about the 
characters that are not in Unicode 2.0 is that they are widely used 
in proper names, and people (and companies) *care* if they are 
constrained against using their names in electronic communication.  I 
don't disagree that as a practical matter the people affected might 
well be content with the fact that this only affects their ability to 
define names (of elements and attributes) not actual text values.

When's the last time you saw a proper name used as a tag name or 
attribute name?  It's just not an important enough use-case to 
justify obsoleting all existing XML software and systems.

Nevertheless, as with line endings, why not bite the bullet now and 
make XML Unicode 3.0-friendly and get on with life? It's one of those 
issues that will require more energy to argue about than to fix, I 

Because it doesn't stop here. We can do Unicode 3.1 today. How about 
Unicode 3.2 next year? And Unicode 4.0 a few years down the line? And 
whatever comes after that? Are we going to keep revving XML to 
support ever-more obscure, less-used characters?

As to energy, remember that if this proposal goes through every XML 
parser vendor has to rewrite their software to support it. A lot of 
software has to be rewritten as well. I know JDOM would. Probably 
some schema validators as well. Certainly the test suites will need 
to be rewritten. And what about all the other specs that depend on 
XML here? How many of them need to be rewritten too?

There's a larger issue here.  XML 1.0 is a W3C "Recommendation", not 
an international standard.  It was well-grounded in concrete SGML 
experience, and has proven remarkably useful in practice.  We can 
greatly respect the effort and knowledge that went into it without 
believing that its authors were omniscient.  I personally don't care 
one way or the other about any of the new "Blueberry" requirements, 
but let's be utterly pragmatic about whether "fixing" XML, or forcing 
various constituencies that were not considered by the authors of XML 
1.0 to adapt to it, is better for everyone in the long run.

Internationalization was a major focus of XML 1.0. Unlike many 
concurrent projects (Java, Oracle) the people who designed XML 
actually understood Unicode. They were able to allow XML the 
flexibility to accommodate new characters as they were defined. But 
for very good reasons, these potential new characters could not be 
allowed in element names as opposed to text content. We are now at a 
point where it would be technically possible to add some (not all) of 
these new characters to XML names, but only at the cost of breaking 
compatibility. That's too high a price to pay.

| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
|                  The XML Bible (IDG Books, 1999)                   |
|              http://metalab.unc.edu/xml/books/bible/               |
|   http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/   |
|  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/     |