xml-dev - Re: [xml-dev] UTF-8+names

Re: [xml-dev] UTF-8+names

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] UTF-8+names
From: Mike Champion <mc@xegesis.org>
Date: Sat, 18 Oct 2003 20:46:35 -0700 (PDT)
In-reply-to: <r02000200-1028-80E2176401D311D8B3120003937A08C2@[192.168.124.11]>

--- "Simon St.Laurent" <simonstl@simonstl.com> wrote:

> 
> I don't know about this.  I have to say that it feel
> fundamentally
> corrupt to me, effectively cheating on the
> separation between character
> encodings and the representation of characters in
> those encodings. 

No more than any other non-trivial encoding, which
gets in to deep philosophical territory about whether
"ö" is the same as "oe" or whether the a Chinese
character that looks like and has the same root
meaning as a Kanji character is the "same" character
or not.  Or whether the Unicode NEL character is a
"standard" newline character because it is mainly used
in those mainframes we like to pretend don't exit :-) 
Tim's approach is taking a real, widespread problem
and offering a clean, layered solution -- essentially
a character encoding preprocessor -- rather than
changing XML itself.

Actually, a similar idea came up at the Binary Infoset
workshop, to leverage/exploit the fact that the XML
spec allows an open-ended set of encodings. This
allows experimentation WITHOUT "corrupting" the core
spec with support for local languages, stuff of
interest mainly to mainframes, or more efficiently
transmittable and/or lexable serializations.  If these
experiments succeed, they can propagate in software
without changes to the XML spec; if they fail, they
can do so without burdening standards-compliant
processors with their lost baggage.  There is some
problem with interoperability, but no more than there
is with other encodings that aren't widely supported
outside some language community, and much less than
with the common problem with the "smart quotes"!  

> 
> That is therefore an enormous processing model
> change. 

I don't see it that way.  It's leveraging the encoding
mechanism to do what XML munges up with all the other
things that DTDs do.  I'm looking forward to using it
outside of XML, e.g. to enter accented/umlaut
characters without having to remember the
platform-specific keyboard hacks or turning on
keyboard modes that have all sorts of annoying side
effects.  How many times have I typed "uber" or
"Godel" because it's too hard to remember how to enter
"über" or "Gödel" [this time I copied/pasted]!

> I can't say I expected XML to drill into the
> Unicode layer and
> modify the very notion of a character encoding.  

IMHO, it extends the Unicode encoding layer upwards to
remove a wart in XML, not vice-versa.

Anyway, I think this is a great idea, and I
congratulate Tim for working it out and moving it
forward.

Follow-Ups:
- Re: [xml-dev] UTF-8+names
  - From: "Simon St.Laurent" <simonstl@simonstl.com>
- Re: [xml-dev] UTF-8+names
  - From: Bill de hÓra <bill.dehora@propylon.com>

References:
- Re: [xml-dev] UTF-8+names
  - From: "Simon St.Laurent" <simonstl@simonstl.com>

Prev by Date: Re: [xml-dev] UTF-8+names
Next by Date: Re: [xml-dev] UTF-8+names
Previous by thread: RE: [xml-dev] UTF-8+names
Next by thread: Re: [xml-dev] UTF-8+names
Index(es):
- Date
- Thread