xml-dev - RE: Unix/Java design issues (Was: Re: Is CDATA "structure"?)

RE: Unix/Java design issues (Was: Re: Is CDATA "structure"?)

[ Lists Home | Date Index | Thread Index ]

From: Tim Bray <tbray@textuality.com>
To: "Hunter, David" <dhunter@Mobility.com>, xml-dev@ic.ac.uk
Date: Wed, 21 Jul 1999 12:29:20 -0700

At 02:48 PM 7/21/99 -0400, Hunter, David wrote:
>"all XML processors <em>must</em> accept the UTF-8 and UTF-16
>encodings of 10646" (emphasis added), since [I believe] UTF-8 and UTF-16 are
>the most common ways to store Unicode characters.  

Unfortunately, no.  I suspect that if you took a worldwide inventory, the
four most common formats would be:

1. A Microsoft codepage that is almost but not quite ISO-8859-1
2. ASCII
3. EBCDIC
4. Shift-JIS

(not necessarily in that order)

Pure ASCII is UTF-8 as it sits, but as the Net becomes less and 
less Anglocentric, there is amazingly little pure ASCII being created
any more.

The XML spec chose UTF-8 and UTF-16 because unlike the other specimens
in the list above, they can encode data containing arbitrary mixtures
of different character sets.  -Tim

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

Prev by Date: RE: Unix/Java design issues (Was: Re: Is CDATA "structure"?)
Next by Date: RE: XML trade off 1 - DTD vs XML Schema
Previous by thread: RE: Unix/Java design issues (Was: Re: Is CDATA "structure"?)
Next by thread: Re: Unix/Java design issues (Was: Re: Is CDATA "structure"?)
Index(es):
- Date
- Thread