OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   UTF-8 or ? for SML (was: Re: Feeler for SML (Simple Markup Language))

[ Lists Home | Date Index | Thread Index ]
  • From: Tony Graham <tgraham@mulberrytech.com>
  • To: <xml-dev@ic.ac.uk>
  • Date: Sat, 13 Nov 1999 14:11:25 -0400 (EST)

At 13 Nov 1999 15:46 -0000, Richard Anderson wrote:
 > But UTF-8 can support "foreign" characters so I dont see the argument for
 > having UTF-16 too.  Also, generally speaking UTF-8 encoding results in
 > smaller output for most cases.

Different people have different ideas of what constitutes "foreign".

For the majority of the characters in the Unicode Standard, UTF-8 uses
three bytes per character.  However, for the US-ASCII characters, it
uses only one byte per character.

For all characters in the Unicode Standard, UTF-16 uses two bytes per
character.

Whether a given file is less bytes as UTF-8 or UTF-16 is largely a
function of the proportion of unaccented Latin characters in the file.

Moreover, most legacy encodings for a single script use one byte per
character, although Chinese, Japanese, and Korean encodings use two or
more bytes per character.  UTF-8, therefore, isn't as efficient as the
legacy encodings of most scripts.  (Its advantage is that it can
represent more scripts than any legacy encoding.)

Regards,


Tony Graham
======================================================================
Tony Graham                            mailto:tgraham@mulberrytech.com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9632
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS