OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Question about UTF-8

[ Lists Home | Date Index | Thread Index ]

Gustaf Liljegren wrote:

> But the question is about general
> ('non-XML-aware') text editors. A general editor has no idea of the
> encoding detection mechanism in XML, so I wonder how it knows that the
> octets C3 A4 should be written '' and not 'ä' (or something else).

It really has no way of knowing, in theory or in practice.  This is a 
big hairy problem.  If you're living in a heterogeneous environment 
where there are multiple encodings, this a good reason to insist on XML.

> Many users who see 'ä' when they open a UTF-8 encoded XML document in a
> text editor, prefer to use ISO 8859-1 to avoid this effect.

That only works until you need to use a character that isn't in 8859-1, 
such as those used by about two thirds of the world's population.

> Maybe the answer is to stay in ISO 8859-1 (or whatever default encoding the
> editor has), but I was hoping it was possible to recommend using UTF-8 all
> the time (for European scripts).

The notion that you can count on never seeing non-European characters is 
a recipe for disaster in today's world.  Good solutions are: (a) as you 
suggest, use UTF-8 all the time, or (b) use XML for interchange.

Cheers, Tim Bray
         (ongoing fragmented essay: http://www.tbray.org/ongoing/)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS