OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: Localisation: Character Encodings & RDBMS, Unicode->UTF-8 wit h Ro u

[ Lists Home | Date Index | Thread Index ]
  • From: Dylan Walsh <Dylan.Walsh@Kadius.com>
  • To: xml-dev@xml.org
  • Date: Mon, 19 Jun 2000 14:22:43 +0100

The irony here is just too much. :-)

I have appended the content of Matts message below.

> -----Original Message-----
> From:	Matt Sergeant [SMTP:matt@sergeant.org]
> Sent:	Monday, June 19, 2000 11:28 AM
> To:	Dylan Walsh
> Cc:	xml-dev@xml.org
> Subject:	RE: Localisation: Character Encodings & RDBMS,
> Unicode->UTF-8 wit	 h Ro und Tripping
> 
> This message uses a character set that is not supported by the Internet
> Service.  To view the original message content,  open the attached
> message. If the text doesn't display correctly, save the attachment to
> disk, and then open it using a viewer that can display the original
> character set. 
> 
> << File: message.txt >> 
> 
On Mon, 19 Jun 2000, Dylan Walsh wrote:

> Forwarding, as it is relevent to this thread.
>=20
> > -----Original Message-----
> > From:	Ronald Bourret [SMTP:rpbourret@hotmail.com]
> > Sent:	Saturday, June 17, 2000 12:35 PM
> > To:	mrys@microsoft.com; Dylan.Walsh@Kadius.com
> > Subject:	RE: Localisation: Character Encodings & RDBMS,
> > Unicode->UTF-8 wit h Ro und Tripping
> >=20
> > Michael Rys wrote:
> >=20
> > >Most databases provide Unicode support (e.g., nchar). Since UTF-8 is=
 an
> > >encoding where the unicode two-byte characters are mapped into a=20
> > >single-byte
> > >character space such that for some characters two or three single-by=
te
> > >characters are used, you of course can easily store UTF-8 as well in=
 an
> > >single-character string datatype. However, strlen functions are norm=
ally
> > >oblivious to the fact that you actually have UTF-8 stored in the lat=
er=20
> > >case,
> > >but just from a storage point of view, you should be able to roundtr=
ip
> > >either UTF-8 or Unicode.
> >=20
> > Note also that, unless the database knows it is storing UTF-8, any=20
> > characters that require two bytes to be stored will be unqueriable. F=
or=20
> > example, suppose the character '=E4' requires two bytes to be store (=
I don't
> >=20
> > actually know if it does or not) and the database thinks it is storin=
g=20
> > ASCII. If so, the query
> >=20
> >   SELECT * FROM Employees WHERE Name=3D"Sch=E4fer"
> >=20
> > will fail because the bytes actually stored in the database are:
> >=20
> >   "Sch--fer"
> >=20
> > where -- represents the two bytes needed to store '=E4', which don't =
match=20
> > "Sch=E4fer".

They do if the query is also in UTF-8, and therefore you're requesting:

SELECT * FROM Employees WHERE Name=3D"Sch--fer"

(using your syntax).

--=20
<Matt/>

Fastnet Software Ltd. High Performance Web Specialists
Providing mod_perl, XML, Sybase and Oracle solutions
Email for training and consultancy availability.
http://sergeant.org | AxKit: http://axkit.org

***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS