XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] [Summary] UTF-8 Question: e with acute accent should require two bytes, right?

So the American Standard Code for Information Interchange begins life as 
a excellent way to encode characters used in the USA into 7 bits. It 
also allows for some control characters that have no equivalent in human 
communication (eg ACK/NAK) because it is a generalised information 
exchange encoding. It was restricted to 7 bits to allow for a parity bit 
so that unreliable modem communications could be easily checked (for 1 
error bit). It was this accidental, but fortuitous, decision that in 
days of reliable comms (at least at the byte level) has allowed ASCII to 
be the basis of extended character sets. ie the MSB  can never by 1 in 
ASCII so if it is 1 we can use that to change our interpretation of what 
follows.

Most character encodings used for more complex character sets have ASCII 
as their starting point. They are ASCII extended for ... by ... This 
includes the UTF codings.

The ubiquity of American English in the computer world means this will 
not change in the forseeable future.

So the encoding for A is the same in ASCII and UTF-8 (by definition as 
an extension), but it is up to the application to recognise the encoding 
and then to display the character. Not forgetting that fonts can mean 
that A doesn't look like A (it could be represented as EAN128 barcode).

Interpretation and agreement on interpretation is everything.

There's a real sense in which UTF-* etc are the Rosetta stone of today.

Rick


Michael Kay wrote:
>> We were speaking specifically of "ASCII" and "UTF-8", no?
>>     
>
> No, in the message in question we were talking about ASCII characters and
> Unicode characters: that is, we were talking about character sets, not
> encodings.
>
> Michael Kay
> http://www.saxonica.com/
>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>   


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS