Re: [xml-dev] nextml

I've
seen a number of "only UTF" comments, and I think that they're rather
western-centric, so I'm thinking "no," there (if someone whose native
language *isn't* west european proposes it, I might rethink).

Rick Jelliffe brings one of the most complete and coherent Eastern/Western perspectives I've ever encountered, and his proposal says:

"A Nuke document is UTF-8 in its external form. Inside a program, after parsing, it would typically use UTF16."

Yes, we all know about the politics and inertia that have affected uptake of Unicode in some geographies, but the "UTF-8 or UTF-16" is there for a very strong pragmatic reason.� Dealing with a pretty open-ended world of character sets, as in XML 1.0 is one of the biggest factors that complicate and slow down parsers, even if you get someone else (e.g. ICU) to do the relatively hard bits.

If we want to have a strong diversity of well-performing and conforming tools, which I suspect is an important component of success for most of us considering XML-NG, I think "UTF-*-only" is the simple reality.� For me, UTF-8 or UTF-16 is certainly an improvement over JSON's UTF-8 only.

I'm curious as to how that JSON limitation is affecting trends in text processing conventions in non-Western countries as "Web 2.0" becomes pervasive.