xml-dev - Re: [xml-dev] Postel's law, exceptions

Re: [xml-dev] Postel's law, exceptions

[ Lists Home | Date Index | Thread Index ]

To: Rick Jelliffe <ricko@allette.com.au>
Subject: Re: [xml-dev] Postel's law, exceptions
From: jcowan@reutershealth.com
Date: Wed, 14 Jan 2004 10:34:22 -0500
Cc: xml-dev@lists.xml.org
In-reply-to: <40054EFD.7070507@allette.com.au>
References: <830178CE7378FC40BC6F1DDADCFDD1D10190E1B6@RED-MSG-31.redmond.corp.microsoft.com> <20040113213315.425bddb7.amyzing@talsever.com> <20040114132545.GJ17723@mercury.ccil.org> <40054EFD.7070507@allette.com.au>
User-agent: Mutt/1.4.1i

Rick Jelliffe scripsit:

> They have *almost* been abstracted away: a Java "character" is UTF-16.
> Some Unicode characters require more than one Java "character" to
> represent then.  All *implementations* of characters have one (or more)
> underlying encoding.  A nominal getEncoding() method on a Java 1.n
> character stream even TeeWriter should always produce "UTF-16".

Well, if you like.  But *diversity* of encodings is lost.

> This should upset no-one, because some real characters may require
> more than one Unicode "character" to represent them, anyway.
> Take Vietnamese, please: if I have a u with a horn accent above plus
> a dot underneath [1], that is one real character (according to what
> people think of as characters) but three Unicode characters, 3 UTF-16
> characters, 6 bytes of storage.

Actually, you can also represent any Vietnamese letter with a single
Unicode (and UTF-16) character, U+1EF1 in this case.

The story with Vietnamese, for those who are curious, is that it has 12
vowel letters (a e i o u y a-breve a-circ e-circ o-circ o-horn u-horn),
each of which may bear one of five tone marks (acute, grave, hook above,
tilde, dot below).

-- 
It was impossible to inveigle           John Cowan <jcowan@reutershealth.com>
Georg Wilhelm Friedrich Hegel           http://www.ccil.org/~cowan
Into offering the slightest apology     http://www.reutershealth.com
For his Phenomenology.                      --W. H. Auden, from "People" (1953)

Follow-Ups:
- Allowing Vietnamese (was Re: [xml-dev] Postel's law, exceptions)
  - From: Rick Jelliffe <ricko@allette.com.au>

References:
- RE: [xml-dev] Postel's law, exceptions
  - From: "Dare Obasanjo" <dareo@microsoft.com>
- Re: [xml-dev] Postel's law, exceptions
  - From: Amelia A Lewis <amyzing@talsever.com>
- Re: [xml-dev] Postel's law, exceptions
  - From: John Cowan <cowan@mercury.ccil.org>
- Re: [xml-dev] Postel's law, exceptions
  - From: Rick Jelliffe <ricko@allette.com.au>

Prev by Date: Newbie XPath
Next by Date: xml:base error handling
Previous by thread: Re: [xml-dev] Postel's law, exceptions
Next by thread: Allowing Vietnamese (was Re: [xml-dev] Postel's law, exceptions)
Index(es):
- Date
- Thread