Restrictions in a schema are often there because we know that the IT system we are sending data to is restricted in what it can handle, and we want to prevent stuff reaching that IT system if we know it can’t handle it. Very often we don’t have the ability to change that IT system. We would love, for example, to allow non-ASCII characters in email addresses, but the internet can’t cope with them and we don’t have the ability to fix the internet. I made yet another attempt to use non-ASCII characters in the design of an XQuery extension recently. The WG chose to define the syntax using only ASCII characters instead. All kinds of reasons: difficulties entering the characters on a keyboard, difficulty making sure the characters aren’t corrupted in transmission, etc. The fact is, use of non-ASCII characters still creates hassle. The 20% is almost certainly an underestimate. Building IT components that handle Unicode strings is dead easy; debugging system problems when messages between the different IT components get mangled can often be a nightmare, and a lot of the pain falls not on IT developers but on end-users who have to cope with inadequate data entry tools and mis-displayed output. Michael Kay Saxonica
|