OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] Question about UTF-8

[ Lists Home | Date Index | Thread Index ]

I think Tim is being unintentionally misleading.  If you stick to ISO 8859-1
then all the European characters will look fine in an editor and you can
still include non-European characters using numeric character references.

You don't need to count on never seeing non-European characters in the data,
you just won't be able to see them as glyphs in an ISO 8859-1 encoding and
they will take up a lot more space if they do appear.  You should feel
comfortable that you won't see many non-European characters in your data
before choosing ISO 8859-1 as your encoding.

Rob McDougall
Sr. Computer Scientist
AdobeSystems Incorporated
Phone:+1 613.940.3708
Fax: +1 613.594.8886

-----Original Message-----
From: Tim Bray [mailto:tbray@textuality.com] 
Sent: Thursday, August 28, 2003 1:02 PM
To: Gustaf Liljegren
Cc: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Question about UTF-8


> Many users who see 'รค' when they open a UTF-8 encoded XML document in a
> text editor, prefer to use ISO 8859-1 to avoid this effect.

That only works until you need to use a character that isn't in 8859-1, 
such as those used by about two thirds of the world's population.

> Maybe the answer is to stay in ISO 8859-1 (or whatever default encoding
> editor has), but I was hoping it was possible to recommend using UTF-8 all
> the time (for European scripts).

The notion that you can count on never seeing non-European characters is 
a recipe for disaster in today's world.  Good solutions are: (a) as you 
suggest, use UTF-8 all the time, or (b) use XML for interchange.

Cheers, Tim Bray
         (ongoing fragmented essay: http://www.tbray.org/ongoing/)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS