xml-dev - Re: [xml-dev] Microsoft FUD on binary XML...

Re: [xml-dev] Microsoft FUD on binary XML...

[ Lists Home | Date Index | Thread Index ]

To: Alaric B Snell <alaric@alaric-snell.com>
Subject: Re: [xml-dev] Microsoft FUD on binary XML...
From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Date: Fri, 21 Nov 2003 22:13:21 -0500
Cc: Tony Graham <Tony.Graham@Sun.COM>, xml-dev@lists.xml.org
In-reply-to: <3FBE14D8.7040405@alaric-snell.com>
References: <004201c3af99$e9944010$650aa8c0@BOBDEV><3FBDAFDA.3010905@allette.com.au> <3FBDF386.9090307@alaric-snell.com><20031121.121152.50253888.Tony.Graham@Sun.COM><3FBE14D8.7040405@alaric-snell.com>

At 1:36 PM +0000 11/21/03, Alaric B Snell wrote:

>People who use pound signs and accented characters, like us 
>Europeans, would see each such symbol taking 3 bytes, but they 
>currently take 2 bytes in UTF-8 and occur only occasionally 
>interspersed with US-ASCII characters anyway, so the hit would be 
>nowhere near as bad as the hit UTF-8 incurs for the Chinese and 
>their neighbours.
>

One should keep in mind that Chinese and similar languages are quite 
compressed to start with, far more so than English text is. For 
example, in UTF-8 the English word "tree" takes four bytes. The 
Japanese word for tree takes three bytes.  The English word "grove" 
takes five bytes. The Japanese word for grove takes three bytes. The 
English word "forest" takes six bytes. The Japanese word for forest 
still takes only three bytes. I don't know the Japanese word for 
antidisestablishmentarianism, but whatever it is, it's probably a lot 
smaller than the English one. Comparing alphabetic languages to 
ideographic ones is really apples to oranges. Word for word, Chinese 
documents tend to be smaller, even in UTF-8.
-- 

   Elliotte Rusty Harold
   elharo@metalab.unc.edu
   Effective XML (Addison-Wesley, 2003)
   http://www.cafeconleche.org/books/effectivexml
   http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA

Follow-Ups:
- Re: [xml-dev] Microsoft FUD on binary XML...
  - From: Rick Jelliffe <ricko@allette.com.au>
- Re: [xml-dev] Microsoft FUD on binary XML...
  - From: Alaric B Snell <alaric@alaric-snell.com>

References:
- RE: [xml-dev] Microsoft FUD on binary XML...
  - From: "Bob Wyman" <bob@wyman.us>
- Re: [xml-dev] Microsoft FUD on binary XML...
  - From: Rick Jelliffe <ricko@allette.com.au>
- Re: [xml-dev] Microsoft FUD on binary XML...
  - From: Alaric B Snell <alaric@alaric-snell.com>
- Re: [xml-dev] Microsoft FUD on binary XML...
  - From: Tony Graham <Tony.Graham@Sun.COM>
- Re: [xml-dev] Microsoft FUD on binary XML...
  - From: Alaric B Snell <alaric@alaric-snell.com>

Prev by Date: Re: [xml-dev] Microsoft FUD on binary XML...
Next by Date: Re: [xml-dev] RE : [xml-dev] Comparison of Xml documents
Previous by thread: Re: [xml-dev] Microsoft FUD on binary XML...
Next by thread: Re: [xml-dev] Microsoft FUD on binary XML...
Index(es):
- Date
- Thread