OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: Non-Unicode Character Sets

[ Lists Home | Date Index | Thread Index ]
  • From: John Cowan <cowan@locke.ccil.org>
  • To: xml-dev@ic.ac.uk
  • Date: Sat, 29 Jan 100 14:20:32 -0500 (EST)

Paul Prescod scripsit:

> I am told that conversion of some character sets through Unicode is
> lossy and cannot be round-tripped. But it occurs ot me that as long as
> one has the private use area, "unknown" characters can always be
> preserved.

Mappings have to serve various purposes: not just round-trippability,
which could be achieved by any arbitrary 1-1 mapping, but also
usefulness.  Not all character set standards agree on what counts
as a character, as opposed to a mere variant that need not be
represented.  Most of Unicode's compatibility characters were added
in order to satisfy these rather disjoint needs.

For example, the Korean standard KSC 5601 provides distinct codepoints
for different "readings" of Chinese characters (hanja) used in Korean writing.
The great bulk of all Chinese characters have only a single reading in
Korean (unlike Japanese), but some few have two, three, or more.
Providing distinct codepoints eased mappings between Korean hanja
and native Korean writing, as each hanja could be given a unique
mapping.

Unicode, however, unified Chinese characters into a single repertoire.
In order to permit round-tripping between KSC 5601 and Unicode,
compatibility characters were added to Unicode for each of the
multi-mapped hanja.

The character set CNS 11643 was not given this treatment, however,
and its (few) multiple mappings do not have Unicode equivalents.  Therefore,
round-tripping is not possible.

> Is there any character set in the world that cannot be considered a
> "subset of Unicode"?

The CCCII standard and its superset EACC (aka ANSI Z39.64) have
many multiple mappings and will not roundtrip through Unicode.

-- 
John Cowan                                   cowan@ccil.org
       I am a member of a civilization. --David Brin

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Unsubscribe by posting to majordom@ic.ac.uk the message
unsubscribe xml-dev  (or)
unsubscribe xml-dev your-subscribed-email@your-subscribed-address

Please note: New list subscriptions now closed in preparation for transfer to OASIS.






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS