xml-dev - Re: [xml-dev] UTF-8+names

Re: [xml-dev] UTF-8+names

[ Lists Home | Date Index | Thread Index ]

To: Alessandro Triglia <sandro@mclink.it>
Subject: Re: [xml-dev] UTF-8+names
From: Tim Bray <tbray@textuality.com>
Date: Sun, 19 Oct 2003 12:08:15 -0700
Cc: 'John Cowan' <cowan@mercury.ccil.org>,'Mike Champion' <mc@xegesis.org>, xml-dev@lists.xml.org
In-reply-to: <000c01c3966d$7fbb9f70$42a7c044@aldebaran>
References: <000c01c3966d$7fbb9f70$42a7c044@aldebaran>
User-agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.5) Gecko/20031007

Alessandro Triglia wrote:

> Indeed, if one uses UTF-8+names just as an encoding of Unicode (with no
> re-interpretation trick), no human user will ever see those  &nbsp;  things.
> All that humans will see is some displayable form of the  NON-BREAK SPACE
> character, which happened to be encoded as  0x26 0x6E 0x62 0x73 0x70 0x3B
> rather than as  0xNN1 0xNN2 (the two bit patterns being equivalent).  

I had to read this a couple of times but now I get it.  For most 
encodings of Unicode I know of, if you're editing a text file, any 
characters that can be displayed are displayed as themselves, not as the 
underlying UTF-8 bit patterns or whatever.  Characters that *can't* be 
displayed show up as diamonds or squiggles or boxes.  +names is 
different in that sometimes a human might want to work with the encoding 
not the actual Unicode characters, purely because &Conint; might look 
better in your file than the surface integral (U+222B) that your screen 
can't display.  On the other hand, since basically every screen in the 
world can now display ü, you'd rather see that than &uuml;.

Bottom line: in some applications this would be convenient.  Others not.

> I am not actually proposing to add this macro functionality to Unicode, but
> I am saying that there are two places where the initial problem can be
> addressed:  either at the XML level or at the Unicode level (which involves
> the displayable form).  Not at the encoding level.

Bear in mind that the initial problem was the ongoing clamor from 
communities of people who really want to use the ISO entity sets but 
don't want to use DTDs.  So far, the standards community has failed to 
come up with an option that is attractive to them.  +names is just a 
trial balloon.  My intuition disagrees with yours, the encoding level 
feels like an appropriate approach to this problem. -Tim

Follow-Ups:
- Re: [xml-dev] UTF-8+names
  - From: James Clark <jjc@jclark.com>

References:
- RE: [xml-dev] UTF-8+names
  - From: "Alessandro Triglia" <sandro@mclink.it>

Prev by Date: Re: [xml-dev] UTF-8+names
Next by Date: Re: [xml-dev] UTF-8+names
Previous by thread: Re: [xml-dev] UTF-8+names
Next by thread: Re: [xml-dev] UTF-8+names
Index(es):
- Date
- Thread