Re: [xml-dev] RE: Encoding charset of HTTP Basic Authentication

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Peter Flynn <peter@silmaril.ie>
To: xml-dev@lists.xml.org
Date: Fri, 03 Feb 2012 22:20:30 +0000

On 03/02/12 12:53, Tei wrote:
> He!...  mediocrity is not always bad.
>
> Say... databases.  Almost all web applications are built using
> relational databases.  Not all applications need a database. But web
> applications are soehorn to use relational databases anyway.

Tell me about it. I worked with some Oracle guys. They're really good, 
and seriously know their stuff, but ask them to add 2+2 and they'll 
define a table with two variables, write a screen to capture two values 
and stuff them into the table, then write a stored procedure to retrieve 
the values, add them, and format a report to print the result.

They are perfectly aware of other ways of doing this, and it's not as if 
they don't have access to tools other than hammers, but I have seen one 
of them populate a HTML select element of five options by pulling the 
data from a specially-written table, for data which won't change in my 
lifetime (Mr/Ms/Mrs/Miss/Dr).

So how do we extend this to the philosophical choices faced with XML, 
HTTP, or anything else? First off, it's scale. Hard-coding a web-form 
visitor's choice of title isn't a big deal, and if you need to add Rev, 
or if you get assimilated by the BBC and need to add Lord and Lady, 
that's not a big deal either; nor is it a big deal doing it in the 
database, modulo some small penalty in retrieval and dependency.

Convincing programmers to soft-code something large and mutable isn't 
hard either, like the set of ISO 3166 country codes, or the 639 language 
codes (or whatever the current suite is, I haven't looked at them for 
years). They can see the need for it, so the second clue is perception. 
We got 2-letter country TLDs when we could have used the 3-letter set 
and stayed the same length as .gov, .mil, .edu, .org, and .com. Doesn't 
matter a whole lot unless we end up with 26×26 nation states on the 
planet. But 2-letter language codes don't scale: there are way more than 
676 languages on this earth.

So eventually you get down to the third factor: judgement, which is 
based on knowledge and a shedload of other things. Programmer X speaks 
only Klingon, and doesn't really know much about other languages except 
for the existence of a few of them, high and far off; and certainly 
doesn't really care that anyone speaking Sindarin would ever have need 
to access the Internet, let alone in their own language. That's a harsh 
view, and mercifully rarer nowadays. Soften it a little and consider IBM 
(I believe: Len? Michael?) who were building a precursor to what would 
eventually become the foundations of Latin-1. Right down somewhere near 
the bottom right-hand corner came the ÿ (yuml) character, which is used 
in French, and mostly in the names of some towns, but so rarely that 
even some French people are unaware of it, as I discovered when I asked 
some French LaTeX typesetters. The ŵ character (wcirc), which is used 
daily by 3 million Welsh speakers didn't appear to get a look in until 
Latin-2.

So whats close to and up front tends to have more effect on what we 
decide -- when nothing else intervenes -- than what is high and far off. 
That's why it's good to have people who are able to take the longer 
view, with deeper knowledge, who can warn when things are likely to 
break if they don't take certain factors into account. They won't always 
be heeded, as Michael says, and we'll always be able to look back with 
regret on things we should have done differently, but I think it *is* 
improving a little: I see projects now which *do* take things into 
account that 20 years ago would (were) laughed out of the water.

So how much attention should we pay to charsets? It turns out, *lots*. 
But like my Oracle programmers, we have a bunch of knowledge on one 
side, and a bunch of tools on the other, and they all work, many even in 
the whole of Unicode, but we can't envisage the scale of demand for 
building that capability into every aspect of what we write, so we stick 
with what we know works. It's wrong, it will break, perhaps 
disastrously, and it will upset and affect lots of people, but right 
there and then it would have taken an order of magnitude longer to do it 
right.

Perhaps we can learn from history for once :-)

///Peter

Follow-Ups:
- Re: [xml-dev] RE: Encoding charset of HTTP Basic Authentication
  - From: John Cowan <cowan@mercury.ccil.org>
- Re: [xml-dev] RE: Encoding charset of HTTP Basic Authentication
  - From: Michael Kay <mike@saxonica.com>

References:
- Re: [xml-dev] RE: Encoding charset of HTTP Basic Authentication
  - From: Andrew Welch <andrew.j.welch@gmail.com>
- RE: [xml-dev] RE: Encoding charset of HTTP Basic Authentication
  - From: "Len Bullard" <cbullard@hiwaay.net>
- Re: [xml-dev] RE: Encoding charset of HTTP Basic Authentication
  - From: Tei <oscar.vives@gmail.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]