OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Use of UTF-8 and UTF-16

[ Lists Home | Date Index | Thread Index ]
  • To: Paul Spencer <xml-dev@boynings.co.uk>, Xml-Dev <xml-dev@lists.xml.org>
  • Subject: Re: [xml-dev] Use of UTF-8 and UTF-16
  • From: Tech Rams <techmailing@yahoo.com>
  • Date: Fri, 28 Oct 2005 09:44:54 -0700 (PDT)
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=fFDaPXZN8lAurdD/6ubSEGxeK5XOrVh6ECcds47tu0iXjzks5nGdmWsFn5lJ9wnyxNJns6JrMHUB/A+DsPD0K+C9FIWfAti6Yp21wJTtMCz6sJhR4E7DTosbCqOlgABV8OBdvwMLfZhxLAPfSC9YOt4NWPYRhTnXhhI6xJ8smaY= ;
  • In-reply-to: <NBBBIBMKFOFCNEBAKDPLCEKIKDAA.xml-dev@boynings.co.uk>

I believe that it is near impossible for parsers to
support every character encoding, particularly given
that you can have your own encoding scheme (if both
parties understand it).

As you mentioned, UTF-8 has no limitations in
expressability and is universally understood. Taking
away the verbosity and processor requirements to parse
documents which contain mostly 16-32 bit data, you
have universal acceptance of your documents.

XML parsers are only one part of the picture. Apart
from the markup, the actual data also needs to be
processed by different applications. If you are using
composite libraries/applications, your best bet is to
deal with character encoding that is the most minimum
common denominator that is supported by those
libraries. And again, universal support for UTF-8
comes handy there.

-rams

--- Paul Spencer <xml-dev@boynings.co.uk> wrote:

> I see many XML-based interoperability projects that
> specify whether to use
> UTF-8 or UTF-16 for Unicode character encoding. One
> will usually result in
> smaller documents/messages that the other (broadly,
> UTF-8 is better if the
> character set is mainly ASCII, and UTF-16 is better
> otherwise). However, I
> see no reason to specify this in terms of
> interoperability since XML
> processors must support both. Obviously, if you are
> using encodings other
> than these, they will need to be specified. Am I
> being stupid here (after
> all, it is Friday afternoon), or is there ever a
> good reason to specify
> which to use other than for document size reasons?
> 
> Paul Spencer
> 
> 
>
-----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org
> <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at
> http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the
> subscription
> manager:
> <http://www.oasis-open.org/mlmanage/index.php>
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS