xml-dev - Re: [xml-dev] Microsoft FUD on binary XML...

Re: [xml-dev] Microsoft FUD on binary XML...

[ Lists Home | Date Index | Thread Index ]

To: Alaric B Snell <alaric@alaric-snell.com>
Subject: Re: [xml-dev] Microsoft FUD on binary XML...
From: John Cowan <cowan@mercury.ccil.org>
Date: Sat, 22 Nov 2003 20:32:51 -0500
Cc: Elliotte Rusty Harold <elharo@metalab.unc.edu>,Tony Graham <Tony.Graham@Sun.COM>, xml-dev@lists.xml.org
In-reply-to: <3FBFF345.9040407@alaric-snell.com>
References: <004201c3af99$e9944010$650aa8c0@BOBDEV> <3FBDAFDA.3010905@allette.com.au> <3FBDF386.9090307@alaric-snell.com> <20031121.121152.50253888.Tony.Graham@Sun.COM> <3FBE14D8.7040405@alaric-snell.com> <p06010201bbe4835029d4@[192.168.254.4]> <3FBFF345.9040407@alaric-snell.com>
User-agent: Mutt/1.3.28i

Alaric B Snell scripsit:

> So languages like 
> Arabic, which are alphabet-based but not very compact in UTF-8 due to 
> being composed of high-numbered characters (although I'm not sure how 
> high so don't know if they would mainly be 2 or 3 bytes or whatever), 

The 2-byte scripts are Latin (including IPA but excluding ASCII), Greek,
Cyrillic, Armenian, Hebrew, Arabic, Syriac, and Thaana.  N'Ko is not
yet encoded but will also probably fall into this range.  All of these
scripts have a small number of characters.

All other modern-use scripts are 3-byte, as are the archaic scripts
Ogham, Runic, and Tagalog (the Tagalog language is now written in the
Latin script).  A few other archaic scripts will probably be encoded in
this range.

All 4-byte scripts are archaic, except that some modern Chinese characters
appear in this range.  The modern-use scripts Blissymbols and Sutton
Signwriting are not yet encoded but will fall into this range,
because of the large number of characters required for each.

> would be better served by an encoding that mainly uses a shiftable 
> window with single-byte characters, I guess.

That's what SCSU is all about.

-- 
John Cowan                              jcowan@reutershealth.com
http://www.reutershealth.com            http://www.ccil.org/~cowan
Humpty Dump Dublin squeaks through his norse
                Humpty Dump Dublin hath a horrible vorse
But for all his kinks English / And his irismanx brogues
                Humpty Dump Dublin's grandada of all rogues.  --Cousin James

References:
- RE: [xml-dev] Microsoft FUD on binary XML...
  - From: "Bob Wyman" <bob@wyman.us>
- Re: [xml-dev] Microsoft FUD on binary XML...
  - From: Rick Jelliffe <ricko@allette.com.au>
- Re: [xml-dev] Microsoft FUD on binary XML...
  - From: Alaric B Snell <alaric@alaric-snell.com>
- Re: [xml-dev] Microsoft FUD on binary XML...
  - From: Tony Graham <Tony.Graham@Sun.COM>
- Re: [xml-dev] Microsoft FUD on binary XML...
  - From: Alaric B Snell <alaric@alaric-snell.com>
- Re: [xml-dev] Microsoft FUD on binary XML...
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] Microsoft FUD on binary XML...
  - From: Alaric B Snell <alaric@alaric-snell.com>

Prev by Date: Re: [xml-dev] Microsoft FUD on binary XML...
Next by Date: Re: [xml-dev] RE : [xml-dev] Comparison of Xml documents
Previous by thread: Re: [xml-dev] Microsoft FUD on binary XML...
Next by thread: Re: [xml-dev] Microsoft FUD on binary XML...
Index(es):
- Date
- Thread