[
Lists Home |
Date Index |
Thread Index
]
- From: Tony Graham <tgraham@mulberrytech.com>
- To: xml-dev@ic.ac.uk
- Date: Fri, 12 Nov 1999 11:08:13 -0400 (EST)
At 11 Nov 1999 17:32 -0500, Clark C. Evans wrote:
> > o UTF-8 encoding only
>
> I'm kinda ingnorant... would it still be
> possible to handle oriental character sets
> with UTF-8 ?
You can still represent the characters from your oriental character
sets using UTF-8, but it takes three bytes per character to do so
(instead of two bytes with UTF-16 and most legacy encodings).
UTF-8 is a win for English text, since the ASCII characters are
represented with one byte. For most scripts, however, UTF-8 takes up
more bytes per character than UTF-16. It is well known the three
bytes per character range includes the CJK ideographs, but it also
includes Hangul, the South and Southeast Asian Scripts, and others too
numerous to mention here.
Whether UTF-8 or UTF-16 is better depends both on what scripts you
mainly use and on what your tools support (since neither UTF-8 support
nor UTF-16 support is universal among general-purpose programming
languages or editors or...).
Regards,
Tony Graham
======================================================================
Tony Graham mailto:tgraham@mulberrytech.com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9632
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|