[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] UTF-8 Question: e with acute accent should requiretwo bytes, right?
- From: Julian Reschke <julian.reschke@gmx.de>
- To: Jonathan Robie <jonathan.robie@redhat.com>
- Date: Fri, 28 Sep 2007 20:35:02 +0200
Jonathan Robie wrote:
> Hi Roger,
>
> UTF-8 uses an 8 bit encoding. E9 fits in 8 bits. It doesn't fit in 7,
> but there's no such thing as UTF-7, the problem you refer to is an ASCII
> 7-bit problem. Since 8 bits represents twice as many characters as 7
> bits, it's enough to represent most European languages using one byte
> per character.
>
> Jonathan
Ahem, this is either incorrect or at least expressed in a confusing way.
UTF-8 uses sequences of bytes (of 8 bits). As UTF-8 can encode all
Unicode code points, most of them -- all characters with code points >=
128 -- need two or more bytes.
So no, although E9 fits into 8 bits, it's UTF-8 encoding requires more
than one byte.
BR, Julian
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]