OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: Localisation: Character Encodings & RDBMS, Unicode->UTF-8 wit h Ro u

[ Lists Home | Date Index | Thread Index ]
  • From: Dylan Walsh <Dylan.Walsh@Kadius.com>
  • To: xml-dev@xml.org
  • Date: Mon, 19 Jun 2000 09:40:33 +0100

Forwarding, as it is relevent to this thread.

> -----Original Message-----
> From:	Ronald Bourret [SMTP:rpbourret@hotmail.com]
> Sent:	Saturday, June 17, 2000 12:35 PM
> To:	mrys@microsoft.com; Dylan.Walsh@Kadius.com
> Subject:	RE: Localisation: Character Encodings & RDBMS,
> Unicode->UTF-8 wit h Ro und Tripping
> 
> Michael Rys wrote:
> 
> >Most databases provide Unicode support (e.g., nchar). Since UTF-8 is an
> >encoding where the unicode two-byte characters are mapped into a 
> >single-byte
> >character space such that for some characters two or three single-byte
> >characters are used, you of course can easily store UTF-8 as well in an
> >single-character string datatype. However, strlen functions are normally
> >oblivious to the fact that you actually have UTF-8 stored in the later 
> >case,
> >but just from a storage point of view, you should be able to roundtrip
> >either UTF-8 or Unicode.
> 
> Note also that, unless the database knows it is storing UTF-8, any 
> characters that require two bytes to be stored will be unqueriable. For 
> example, suppose the character 'ä' requires two bytes to be store (I don't
> 
> actually know if it does or not) and the database thinks it is storing 
> ASCII. If so, the query
> 
>   SELECT * FROM Employees WHERE Name="Schäfer"
> 
> will fail because the bytes actually stored in the database are:
> 
>   "Sch--fer"
> 
> where -- represents the two bytes needed to store 'ä', which don't match 
> "Schäfer".
> 
> This is obviously not a problem if the data is not used except through
> XML.
> 
> > > Can you convert the various encoding schemes to UTF-8 for storage, and
> > > convert them back on retrieval?
> 
> Yes.
> 
> > > Would such round-tripping require you to
> > > store the name of the original encoding alongside the UTF version?
> 
> It would need to be stored somewhere -- in the database, in the
> application, 
> in a file that shows how XML is mapped to the database, etc.
> 
> -- Ron Bourret
> 
> P.S. Feel free to forward this to xml-dev if you want. I'm not currently a
> 
> member and can't post.
> 
> 
> ________________________________________________________________________
> Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com

***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS