OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] RE: Encoding charset of HTTP Basic Authentication

* David Lee wrote:
>What shocks *me* is that the intent of base64 is stated to allow more
>characters then HTTP headers allow but then due to the lack of
>encoding/charset specification allows precious few.
>A lot of work for almost nothing.  A simple insertion of the text "UTF8
>encoded prior to base64" would have nailed it.

The distinction between bytes and characters is a fairly recent develop-
ment. When I bought my first computer in the 1990s it came with Windows
3.11 and MS-DOS 6.22, and the german versions of those use different
"code pages", meaning plain text files with Umlauts that I created in
DOS did not work right under Windows and vice versa.

Some years later when I got an Internet connection at home I chatted
with an egyptian girl living egypt over ICQ. She wanted to send me a
letter and asked for my address, and she asked about the characters in
my name and address and I tried hard to explain umlauts and that I use
"oe" in my "nickname" as transliteration... and eventually I got her
letter containing mojibake where the umlauts should have gone.

It's not something that was wired into people's heads at the time, and
we are still suffering from that. If I put my proper name into the From
header I will without a doubt see my name mangled in replies or online
archives shortly after. Heck, in back in 2004 I filed comments on W3C's
Character Model for the World Wide Web specification, developed by the
I18N Working Group there, through an online form they developed, and my
name came out as mojibake in the list archive where the comments were
copied to.

Note that this is in part due to missing infrastructure, in the DOS/Win-
dows case, apart from heuristics software would have had no means to de-
cide whether something was a Windows plain text file or a DOS plain text
file. Months ago I made a Perl script that renames audio files based on
meta data, and the built-in renaming function did not work right, I had
to use a special Win32::Unicode::File module to get the right names. We
are still paying this technical debt off.

(The authors of RFC 2617 should probably have known better due to RFC
2277, it might be interesting to find out why/if this issue was missed.)
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS