[
Lists Home |
Date Index |
Thread Index
]
On May 13, 2005, at 22:40, Robin Berjon wrote:
> Yes this may break software that is making stupid assumptions about
> the content of certain tokens, but such software was written based on
> a misunderstanding of text and deserves to break (and then to be shot
> in the kneecaps, tied to a horse and dragged all around town, dipped
> in boiling lead, dismembered piece by piece with a rusty spoon, and
> finally dumped in a ditch to agonize).
> How can XML be the universal data format without the ability to handle
> universal text?
I can't use spaces in element names. My mother tongue uses spaces. I am
being oppressed!
Being able to carry content in any language and being able to use
anything in element names are two totally different things. The first
one is crucial. The latter is not. In fact, the world keeps turning
with XHTML, DocBook, SVG, OOo XML, Atom etc. using English-based ASCII
element names. The point is that the content can be in any language. I
think i18n political correctness goes overboard when interoperability
is sacrificed in order to change the characters allowed in
programmer-visible identifiers.
My mother tongue is not ASCII-safe. It also isn't invariant under
canonical decomposition. When I design and XML vocabulary, I use
English-based ASCII element and attribute names. I don't want to ever
spend a single minute debugging an app, because someone was being
politically correct and used umlauts in element names and then the app
expected the decomposed form but the document had them in the
precomposed form (or vice versa).
BTW, is there any actual research about the demand for non-ASCII
element names? XML 1.0 allows a large chunk of non-ASCII on element
names. Is any real-world XML vocabulary actually exercising the freedom
to go beyond ASCII in element and attribute names (except perhaps some
vocabulary that is only used in Japan)?
--
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
|