[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] [Summary] UTF-8 Question: e with acute accent should require two bytes, right?
- From: Rick Marshall <rjm@zenucom.com>
- To: xml-dev@lists.xml.org
- Date: Sun, 30 Sep 2007 09:10:02 +1000
So the American Standard Code for Information Interchange begins life as
a excellent way to encode characters used in the USA into 7 bits. It
also allows for some control characters that have no equivalent in human
communication (eg ACK/NAK) because it is a generalised information
exchange encoding. It was restricted to 7 bits to allow for a parity bit
so that unreliable modem communications could be easily checked (for 1
error bit). It was this accidental, but fortuitous, decision that in
days of reliable comms (at least at the byte level) has allowed ASCII to
be the basis of extended character sets. ie the MSB can never by 1 in
ASCII so if it is 1 we can use that to change our interpretation of what
follows.
Most character encodings used for more complex character sets have ASCII
as their starting point. They are ASCII extended for ... by ... This
includes the UTF codings.
The ubiquity of American English in the computer world means this will
not change in the forseeable future.
So the encoding for A is the same in ASCII and UTF-8 (by definition as
an extension), but it is up to the application to recognise the encoding
and then to display the character. Not forgetting that fonts can mean
that A doesn't look like A (it could be represented as EAN128 barcode).
Interpretation and agreement on interpretation is everything.
There's a real sense in which UTF-* etc are the Rosetta stone of today.
Rick
Michael Kay wrote:
>> We were speaking specifically of "ASCII" and "UTF-8", no?
>>
>
> No, in the message in question we were talking about ASCII characters and
> Unicode characters: that is, we were talking about character sets, not
> encodings.
>
> Michael Kay
> http://www.saxonica.com/
>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]