[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: XML Blueberry (non-ASCII name characters in Japan)
- From: Rick Jelliffe <firstname.lastname@example.org>
- To: email@example.com
- Date: Tue, 10 Jul 2001 19:16:00 +0800
From: "Thomas B. Passin" <firstname.lastname@example.org>
> So, you CJK-obscure-coding unicode experts out there, what's the betting
> how the characters will get into people's text-producing programs? WIll
> people be typing these new characters into documents with abandon?
Same as now. If someone writes a Spanish n with a tilde in there DTD, and
you editor is an ASCII editor, it cannot edit it. If you are lucky it will
preserve it. If you are unlucky it will corrupt it.
There are no numeric character references in names. So a name is always
readable in a text editor which accepts the encoding; there are never any
references which need to be dereferenced.
Of course, if I wanted to make an obscure DTD, I could use Greek (if you
cannot read Greek) or some cartoonish mix of characters. But then it is
Restricting names to letters and other symbols that are typically used for
pronouncable, readable words in each language is not only good for catching
transcoding errors (important in some places) and to allow easier use of the
names as object names in scripts (where you don't want them to start with a
digit), but very importantly it acts against people making random (i.e.
private/proprietary) names in their DTDs as a way to capture users. They
can still do it, of course, but they cannot pretend "oh, we didn't know a
name should be readable so we just used UUIDs for all our names", batting