[
Lists Home |
Date Index |
Thread Index
]
- From: Dylan Walsh <Dylan.Walsh@Kadius.com>
- To: xml-dev@xml.org
- Date: Mon, 19 Jun 2000 14:22:43 +0100
The irony here is just too much. :-)
I have appended the content of Matts message below.
> -----Original Message-----
> From: Matt Sergeant [SMTP:matt@sergeant.org]
> Sent: Monday, June 19, 2000 11:28 AM
> To: Dylan Walsh
> Cc: xml-dev@xml.org
> Subject: RE: Localisation: Character Encodings & RDBMS,
> Unicode->UTF-8 wit h Ro und Tripping
>
> This message uses a character set that is not supported by the Internet
> Service. To view the original message content, open the attached
> message. If the text doesn't display correctly, save the attachment to
> disk, and then open it using a viewer that can display the original
> character set.
>
> << File: message.txt >>
>
On Mon, 19 Jun 2000, Dylan Walsh wrote:
> Forwarding, as it is relevent to this thread.
>=20
> > -----Original Message-----
> > From: Ronald Bourret [SMTP:rpbourret@hotmail.com]
> > Sent: Saturday, June 17, 2000 12:35 PM
> > To: mrys@microsoft.com; Dylan.Walsh@Kadius.com
> > Subject: RE: Localisation: Character Encodings & RDBMS,
> > Unicode->UTF-8 wit h Ro und Tripping
> >=20
> > Michael Rys wrote:
> >=20
> > >Most databases provide Unicode support (e.g., nchar). Since UTF-8 is=
an
> > >encoding where the unicode two-byte characters are mapped into a=20
> > >single-byte
> > >character space such that for some characters two or three single-by=
te
> > >characters are used, you of course can easily store UTF-8 as well in=
an
> > >single-character string datatype. However, strlen functions are norm=
ally
> > >oblivious to the fact that you actually have UTF-8 stored in the lat=
er=20
> > >case,
> > >but just from a storage point of view, you should be able to roundtr=
ip
> > >either UTF-8 or Unicode.
> >=20
> > Note also that, unless the database knows it is storing UTF-8, any=20
> > characters that require two bytes to be stored will be unqueriable. F=
or=20
> > example, suppose the character '=E4' requires two bytes to be store (=
I don't
> >=20
> > actually know if it does or not) and the database thinks it is storin=
g=20
> > ASCII. If so, the query
> >=20
> > SELECT * FROM Employees WHERE Name=3D"Sch=E4fer"
> >=20
> > will fail because the bytes actually stored in the database are:
> >=20
> > "Sch--fer"
> >=20
> > where -- represents the two bytes needed to store '=E4', which don't =
match=20
> > "Sch=E4fer".
They do if the query is also in UTF-8, and therefore you're requesting:
SELECT * FROM Employees WHERE Name=3D"Sch--fer"
(using your syntax).
--=20
<Matt/>
Fastnet Software Ltd. High Performance Web Specialists
Providing mod_perl, XML, Sybase and Oracle solutions
Email for training and consultancy availability.
http://sergeant.org | AxKit: http://axkit.org
***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************
|