OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Use of UTF-8 and UTF-16

[ Lists Home | Date Index | Thread Index ]

Paul Spencer said:
> I see many XML-based interoperability projects that specify whether to use
> UTF-8 or UTF-16 for Unicode character encoding. One will usually result in
> smaller documents/messages that the other (broadly, UTF-8 is better if the
> character set is mainly ASCII, and UTF-16 is better otherwise).

For Western XML documents, UTF-8 files will in every case be smaller than
UTF-16, even for non-Latin scripts.

For CJK (Chinese, Japanese, Korean) XML documents, where three (or six)
bytes may be used by UTF-8 instead of UCS-16's two (or four), UTF-16 files
will usually be smaller.

But filesize is not the only factor. There is of course a small cost in
converting from the internal encoding used by software and the
transmission encoding. And compression adds cost but equalizes filesize.

Rick Jelliffe





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS