[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: XML Blueberry
- From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- To: Mike.Champion@SoftwareAG-USA.com, xml-dev@lists.xml.org
- Date: Thu, 21 Jun 2001 12:47:18 -0400
At 11:21 AM -0400 6/21/01, Mike.Champion@SoftwareAG-USA.com wrote:
This is an area that I know even less about, but I have been burned
repeatedly by taking the same position as Elliot Rusty Harold does.
As I understand it, the principal reason that people care about the
characters that are not in Unicode 2.0 is that they are widely used
in proper names, and people (and companies) *care* if they are
constrained against using their names in electronic communication. I
don't disagree that as a practical matter the people affected might
well be content with the fact that this only affects their ability to
define names (of elements and attributes) not actual text values.
When's the last time you saw a proper name used as a tag name or
attribute name? It's just not an important enough use-case to
justify obsoleting all existing XML software and systems.
Nevertheless, as with line endings, why not bite the bullet now and
make XML Unicode 3.0-friendly and get on with life? It's one of those
issues that will require more energy to argue about than to fix, I
suspect.
Because it doesn't stop here. We can do Unicode 3.1 today. How about
Unicode 3.2 next year? And Unicode 4.0 a few years down the line? And
whatever comes after that? Are we going to keep revving XML to
support ever-more obscure, less-used characters?
As to energy, remember that if this proposal goes through every XML
parser vendor has to rewrite their software to support it. A lot of
software has to be rewritten as well. I know JDOM would. Probably
some schema validators as well. Certainly the test suites will need
to be rewritten. And what about all the other specs that depend on
XML here? How many of them need to be rewritten too?
There's a larger issue here. XML 1.0 is a W3C "Recommendation", not
an international standard. It was well-grounded in concrete SGML
experience, and has proven remarkably useful in practice. We can
greatly respect the effort and knowledge that went into it without
believing that its authors were omniscient. I personally don't care
one way or the other about any of the new "Blueberry" requirements,
but let's be utterly pragmatic about whether "fixing" XML, or forcing
various constituencies that were not considered by the authors of XML
1.0 to adapt to it, is better for everyone in the long run.
Internationalization was a major focus of XML 1.0. Unlike many
concurrent projects (Java, Oracle) the people who designed XML
actually understood Unicode. They were able to allow XML the
flexibility to accommodate new characters as they were defined. But
for very good reasons, these potential new characters could not be
allowed in element names as opposed to text content. We are now at a
point where it would be technically possible to add some (not all) of
these new characters to XML names, but only at the cost of breaking
compatibility. That's too high a price to pay.
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| The XML Bible (IDG Books, 1999) |
| http://metalab.unc.edu/xml/books/bible/ |
| http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://metalab.unc.edu/javafaq/ |
| Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/ |
+----------------------------------+---------------------------------+