OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Quiz: How do you put a Euro sign in your data if yourXML uses windows-1252 encoding and you use a numeric character reference?

I've been advising people how to solve character encoding issues for about 100 years, but our own internal system for handling Saxon license requests still gets it wrong. It ain't easy.

> For what it's worth, 1: Joel Spolsky's article on "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" <http://www.joelonsoftware.com/articles/Unicode.html> is quite good, I think.

The thread seems to be pointing to two conclusions:

(a) there are people who don't understand the theory, and need to be 
educated (I don't know if Roger's insight about &x80 really was a new 
discovery for him, if so I am rather shocked).

(b) but even if you do understand the theory, it's still hard to get it 
right in practice, because our systems are complex and built from 
heterogeneous components, many of which are outside our control, cannot 
be easily changed, and are poorly documented; the more complex they 
become, the more opportunities there are for data to be corrupted across 
the component boundaries.

The underlying problem is that components throw bytes at each other 
without first agreeing what they mean, and because it works most of the 
time (i.e. when you speak English) people live with the problem rather 
than fixing it; and because they don't fix it, it gets worse.

Michael Kay

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS