OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Blueberry/Unicode/XML

Jonathan Borden scripsit:

> Aside for perhaps arbitrary (perhaps not :-) decisions about what characters
> ought or ought not be used to name things, what are these "good reasons"?
> I specifically include in "good reasons":
> 1) useful pieces of code that would break
> 2) hindrances to the development of useful pieces of code

The main point is that it wouldn't be plain text any more.  If XML is just a
binary format, something that no human being ever looks at, then
ASCII markup is plenty: you can tag everything x1, x2, x3, ....

But there are many Unicode characters that are very similar to others,
such as the halfwidth-fullwidth case that's been talked about already,
or the 127 (:-)) kinds of stars, or the various kinds of whitespace
that aren't, and so on.

> I am not limiting the list to these two, but I would like to develop a
> practical way of deciding these very important issues. Clearly any way this
> is decided, tradeoffs are to be made, and I want to give strong weight to
> practical consequences -- just to be clear, I place a high value on the
> ability of humans to read XML, including its markup.

Limiting names to linguistic representations, and univocal ones,
makes it much less likely that they'll be mixed up with one another,
leading to confusion or even fraud.

> But honestly I am hardly a unicode expert, its just that my perhaps naive
> impression is that given whatever nastly confusing problems that might occur
> using weird unicode characters in names, could as easily be replicated using
> nasty confusing -- yet well-formed -- names in XML as it stands today.
> Please educate me otherwise (i.e. this is just my impression).

The existing situation *can* be problematic: a capital alpha can be
subsituted for a capital A, indistinguishably to a human being,
for instance.  That is annoying.  Allowing in the non-alphanumeric
characters can only make it worse.

John Cowan                                   cowan@ccil.org
One art/there is/no less/no more/All things/to do/with sparks/galore
	--Douglas Hofstadter