OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] Use of UTF-8 and UTF-16

[ Lists Home | Date Index | Thread Index ]

Actually some environments have better UTF-16 support than UTF-8
support. But you make a good point about what other components support
that you need to use. And you should make sure that you stay consistent.
So if you use a C# string that is a 2-byte char and pass the data to a
database in a 2-byte Unicode codepage then UTF-16 is probably better. If
you transport it through 1-byte characters in code pages that support
UTF-8 or byte streams, then UTF-8 is probably better.

Best regards
Michael

> -----Original Message-----
> From: Tech Rams [mailto:techmailing@yahoo.com]
> Sent: Friday, October 28, 2005 9:45 AM
> To: Paul Spencer; Xml-Dev
> Subject: Re: [xml-dev] Use of UTF-8 and UTF-16
> 
> I believe that it is near impossible for parsers to
> support every character encoding, particularly given
> that you can have your own encoding scheme (if both
> parties understand it).
> 
> As you mentioned, UTF-8 has no limitations in
> expressability and is universally understood. Taking
> away the verbosity and processor requirements to parse
> documents which contain mostly 16-32 bit data, you
> have universal acceptance of your documents.
> 
> XML parsers are only one part of the picture. Apart
> from the markup, the actual data also needs to be
> processed by different applications. If you are using
> composite libraries/applications, your best bet is to
> deal with character encoding that is the most minimum
> common denominator that is supported by those
> libraries. And again, universal support for UTF-8
> comes handy there.
> 
> -rams
> 
> --- Paul Spencer <xml-dev@boynings.co.uk> wrote:
> 
> > I see many XML-based interoperability projects that
> > specify whether to use
> > UTF-8 or UTF-16 for Unicode character encoding. One
> > will usually result in
> > smaller documents/messages that the other (broadly,
> > UTF-8 is better if the
> > character set is mainly ASCII, and UTF-16 is better
> > otherwise). However, I
> > see no reason to specify this in terms of
> > interoperability since XML
> > processors must support both. Obviously, if you are
> > using encodings other
> > than these, they will need to be specified. Am I
> > being stupid here (after
> > all, it is Friday afternoon), or is there ever a
> > good reason to specify
> > which to use other than for document size reasons?
> >
> > Paul Spencer
> >
> >
> >
> -----------------------------------------------------------------
> > The xml-dev list is sponsored by XML.org
> > <http://www.xml.org>, an
> > initiative of OASIS <http://www.oasis-open.org>
> >
> > The list archives are at
> > http://lists.xml.org/archives/xml-dev/
> >
> > To subscribe or unsubscribe from this list use the
> > subscription
> > manager:
> > <http://www.oasis-open.org/mlmanage/index.php>
> >
> >
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS