Re: [xml-dev] A dandy little technique for constraining your strings to

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] A dandy little technique for constraining your strings to ASCII

From: Michael Kay <mike@saxonica.com>
To: John Cowan <johnwcowan@gmail.com>
Date: Wed, 21 Oct 2015 23:11:17 +0100

Restrictions in a schema are often there because we know that the IT system we are sending data to is restricted in what it can handle, and we want to prevent stuff reaching that IT system if we know it can’t handle it. Very often we don’t have the ability to change that IT system. We would love, for example, to allow non-ASCII characters in email addresses, but the internet can’t cope with them and we don’t have the ability to fix the internet.

I made yet another attempt to use non-ASCII characters in the design of an XQuery extension recently. The WG chose to define the syntax using only ASCII characters instead. All kinds of reasons: difficulties entering the characters on a keyboard, difficulty making sure the characters aren’t corrupted in transmission, etc. The fact is, use of non-ASCII characters still creates hassle. The 20% is almost certainly an underestimate. Building IT components that handle Unicode strings is dead easy; debugging system problems when messages between the different IT components get mangled can often be a nightmare, and a lot of the pain falls not on IT developers but on end-users who have to cope with inadequate data entry tools and mis-displayed output.

Michael Kay

Saxonica

On 21 Oct 2015, at 18:30, John Cowan <johnwcowan@gmail.com> wrote:

On Wed, Oct 21, 2015 at 1:07 PM, Costello, Roger L. <costello@mitre.org> wrote:

You want each string constrained to just ASCII characters.

You may want that, but it's a very bad idea. As Tim Bray said years ago, the cost of internationalization is maybe 20% extra if you build it in from the beginning, whereas it's about 100% if you try to retrofit it. Restricting your data to ASCII or any other set less than Unicode is a Bad Thing from day one. Instead of restricting your data to fit your obsolete processing model, upgrade your processing model to reflect the realities of textual data in the real world.

--
GMail doesn't have rotating .sigs, but you can see mine at http://www.ccil.org/~cowan/signatures

References:
- A dandy little technique for constraining your strings to ASCII
  - From: "Costello, Roger L." <costello@mitre.org>
- Re: [xml-dev] A dandy little technique for constraining your stringsto ASCII
  - From: John Cowan <johnwcowan@gmail.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]