OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Processing XML 1.1 documents with XML Schema 1.0 processor

[ Lists Home | Date Index | Thread Index ]

(I smell a troll.)

Henri Sivonen wrote:

> BTW, is there any actual research about the demand for non-ASCII 
> element names? XML 1.0 allows a large chunk of non-ASCII on element 
> names. Is any real-world XML vocabulary actually exercising the 
> freedom to go beyond ASCII in element and attribute names (except 
> perhaps some vocabulary that is only used in Japan)?

What the **** does that question mean? That element names only used in 
one country should not be supported in a standard designed to suit the 
whole world?   It is a simple fact that ASCII transliterations of many 
languages, in particular those with tonal pronunciation, homophones and 
idoegraphic scripts, can frequently be incomprehensible.  (Add to this 
that there are regional concepts (e.g. in addresses) for which there may 
be no English analog.) The most direct way of putting the question is 
"Why should W3C put out a standard that arbitrarily makes things easier 
for white people than for yellow people?" A space can easily be replaced 
by a "_": what should the ideograph for a mountain be replaced by: the 
sound, the meaning, a translation? How does a reader reconstruct the 

XML's name rules are important precisely because they don't adopt the 
bogus minimalist approach. I am not saying that anyone who wants 
ASCII-only markup is a greedy, lazy, selfish, unjust, uncaring, 
clock-back-turning,  unpragmatic racist or Western supremicist; on the 
contrary, there are lots of reasons why an organization or individual 
*should* use ASCII for Western and international document types. But ISO 
standards like SGML must support International requirements, and W3C 
profiles like XML must support world-wide adoption.

A less inflammatory response is that the importance of names in markup 
is not that they are easy to write, but that they are meaningful to 
read. The better analogy to make isn't the inconvience of making you 
write ASCII, but the inconvenience if you had to write using, say, Greek 
characters. You  probably could do it, but it would add a layer of 
inconvenience that would probably make you avoid using the technology 
where you had a choice.

<boring_old_geezer_mode>I designed the original naming rules that XML 
1.0 adopted pretty much intact. In 1994-5 or so, I had been given a 
project by Allette Systems in Australia to figure out why adoption of 
SGML was slow in East Asian countries. During this time I visited 
several Asian countries (I had learned SGML while living in Japan 
working in publishing) and made contacts with many people in publishing. 

I made up something called the ERCS (Extended Reference Concrete 
Syntax), with input from many people: Gavin Nicol, Tony Graham and James 
Clark are three Western names who have posted to XML-DEV, for example. 
Included on this list were some features that were adopted by XML (that 
Unicode characters should be available regardless of the document format 
using hex references, for example) in particular, to support "native 
language markup".  (Note: not "natural language markup") ERCS was 
adopted by the SPREAD (Standardization Project Regarding East Asian 
Documents) of the CJK DOCP (China/Japan/Korea Document Processing 
Experts Group) which was a liason group between industry, academia and 
standards bodies. That  gave ERCS the credibility so it was already a 
pretty workable package by the time XML came along.  (The SPREAD 
entities occassionally crop up but are obsoleted by XML: the W3C Charmod 
spec mentions them though, which is nice.)

More info, if anyone cares, is at

Here is what the ERCS document, now 10 years old, says about Native 
Language Markup:

"Much of the value of using SGML markup, especially for structure-based searches in hypertext, is that the tag names and other markup can have meaning to the
user rather than being cryptic mnemonics. This is most true for SGML documents that contain fielded data. So the provision of native-language tagging is a key
facility that SGML will need to supply to be successful.

"So the best concrete syntax for a given character set is one that does not artificially or gratuitously restrict what characters are available for use as
markup. In the absence of other factors, if a character appears in words in the native language, it should be available for use in NAMEs. And similarly, if a
symbol character is readily available from the keyboard, it should be available for use in short references.

In particular, it is important to recognize that NAMEs in XML/SGML are 
not just used for element and attribute names, but also for IDs. An ID 
is often taken from the value of content, which usually will be some 
native language.  If you have looked at C programes written by Chinese 
or Japanese, for example, you will see that people like using their 
native language (and script) for writing the names of things.

So the XML 1.0 naming rules are the end of a chain that began by looking 
at how to solve an actual problem with the acceptability of ASCII-only 
markup. A solution was worked in consultation with people from East Asia 
and the ASCII West.  It was adopted by standards bodies, and influenced 
SGML, HTML and most directly XML.  XML's enormous popularity has often 
been attributed (notably by Tim Bray) in large part to its good 
bottom-line for internationalization.

(XML 1.1 removes the checks on particular characters, but in the 
direction of more openness, not more restrictiveness, which is good from 
this perspective at least.  But it is interesting to see someone saying 
the world is not ready for names above the first seven bits of Unicode, 
while XML 1.1 had discussion about whether the world was ready for names 
above the first 16 bits of Unicode :-)

Murata Makoto is speaking at XTech 2005 in a fortnight's time on the 
Japanese Goverment's adoption of XML. It will be interesting to see to 
what extent they use ASCII.

Rick Jelliffe


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS