Lists Home |
Date Index |
I think Tim is being unintentionally misleading. If you stick to ISO 8859-1
then all the European characters will look fine in an editor and you can
still include non-European characters using numeric character references.
You don't need to count on never seeing non-European characters in the data,
you just won't be able to see them as glyphs in an ISO 8859-1 encoding and
they will take up a lot more space if they do appear. You should feel
comfortable that you won't see many non-European characters in your data
before choosing ISO 8859-1 as your encoding.
Sr. Computer Scientist
Fax: +1 613.594.8886
From: Tim Bray [mailto:firstname.lastname@example.org]
Sent: Thursday, August 28, 2003 1:02 PM
To: Gustaf Liljegren
Subject: Re: [xml-dev] Question about UTF-8
> Many users who see '√§' when they open a UTF-8 encoded XML document in a
> text editor, prefer to use ISO 8859-1 to avoid this effect.
That only works until you need to use a character that isn't in 8859-1,
such as those used by about two thirds of the world's population.
> Maybe the answer is to stay in ISO 8859-1 (or whatever default encoding
> editor has), but I was hoping it was possible to recommend using UTF-8 all
> the time (for European scripts).
The notion that you can count on never seeing non-European characters is
a recipe for disaster in today's world. Good solutions are: (a) as you
suggest, use UTF-8 all the time, or (b) use XML for interchange.
Cheers, Tim Bray
(ongoing fragmented essay: http://www.tbray.org/ongoing/)