[
Lists Home |
Date Index |
Thread Index
]
- From: "Simon St.Laurent" <simonstl@simonstl.com>
- To: XML-Dev Mailing list <xml-dev@xml.org>
- Date: Sat, 19 Feb 2000 18:30:32 -0500
Rick Jelliffe asked that I forward this to the list - it's yet more answers
on the encoding converter question.
>Date: Sun, 20 Feb 2000 06:32:25 +0800 (CST)
>From: Rick Jelliffe <ricko@gate.sinica.edu.tw>
>Subject: Re request on XML-DEV
>
>GLUE and XML-TCS Transcoding Utility Software
>---------------------------------------------
>
>I have made an XML-aware version of TCS. The diff package is available at
>the Chinese XML Now site. It implements "lossless" transcoding, which is
>what I talked about that the XML Conference we met at last year. It
>basically means that you should convert unknown characters to NCRS.
>
>I can only provides diffs for because Bell has not AFAIK made tcs
>available for redistribution, even though at least one version of Linux
>does include it. I don't think they care particularly, but without
>confirmation I cannot make up binaries or a unified source
>distribution, unfortunately. The people involved cannot be contacted; the
>project leader is Dennis Ritchie (i.e., UNIX and C) who undoubtedly has
>more pressing matters to attend to.
>
>*HOWEVER* at my site you will also see "The GLUE Project Transcoders"
>
>GLUE (= "GLUE Loses User's Encodings") is a transcoder library I wrote.
>It is specified using XML and converted to C. At the moment, only the
>x->UTF-8 is available, but that seems to be all you want.
>
>I made it because the existing transcoders had problems: the GNU iconv
>ones required their new glibc; and so on. Since then, IBM has released
>their excellent C++ libraries ICU, but it too do not do lossless
>transcoding. Also, Java now generates an exception if a character is
>missing instead of just silently swallowing the character; these are steps
>in the right direction.
>
>The mapping tables at Unicode.org have the problem that many encodings are
>better mapped by algorithm rather than by a table. So I made an XML format
>that could express declaratively certain relationships in a way
>that can be simply translated into code. Also, many encodings have
>variants, which can be represented well in XML.
>
>GLUE home page is at:
> http://www.ascc.net/xml/en/utf-8/glue.html
>GLUE handles the following encodings:
>
> ASCII
> ISO 646de
> ISO 646en
> ISO 646es
> ISO 646fr
> ISO 646it
> ISO 646sv
> ISO 8859-1 (Latin 1)
> CP1252 variant (Windows "ANSI")
> ISO 8859-2 (Latin 2)
> CP 1250 variant
> ISO 8859-3 (Latin 3)
> ISO 8859-4 (Latin 4)
> ISO 8859-5 (Cyrillic)
> ISO 8859-6 (Arabic)
> ISO 8859-7 (Greek)
> ISO 8859-8 (Hebrew)
> ISO 8859-9 (Latin 5)
> ISO 8859-10 (Latin 6)
> ISO 8859-11 (Thai)
> ISO 8859-13 (Latin 7)
> ISO 8859-14 (Latin 8)
> ISO 8859-15 (Latin 9)
> MacRoman
> MacRoman with Euro
> UTF-8
> UTF-16 (little endian)
> UTF-16 (big endian)
> Big5 (Chinese, including user-defined area)
> VISCII (Vietnamese)
>(Note: the variants have not been tested thoroughly. Check them to
>confirm. The current implemetnation does not support well ISO 2022
>based encodings or non-Unicode encodings (i.e. the massice CCCII))
>
>
>The xml-tcs home page is at
> http://www.ascc.net/xml/en/utf-8/transcode-index.html
>
>xml-tcs can generate the following NCRS with single or double delimiting
>
> STRIP: no delimiter,
> UNKNOWN: put in unknown character indicator "?" or FFFD
> UNICODE: Unicode-style U+HHHH
> JAVA: Java-style \uHHHH
> JAVA_DD: Java-style \\uHHHH
> XML: XML-style &#xHHHH;
> XML_DD: XML-style &#xHHHH;
> SPREAD1: Old SPREAD &U-HHHH;
> SPREAD1_DD: Old SPREAD &U-HHHH;
> SPREAD2: New SPREAD &UHHHH;
> SPREAD2_DD: New SPREAD &UHHHH;
> CSS1: CSS1 \HHHH
> CSS1_DD: CSS1 \\HHHH
> CSS2: CSS2 \\00HHHH (space following is delimiter)
> CSS2_DD: CSS2 \\00HHHH (space following is delimiter)
> SGML: SGML-, HTML (< 4) and Netscape style
> decimal &#DDDDDD;
> SGML_DD: SGML-style &#DDDDDD;
>
>
>
>
>
>Rick Jelliffe
>
Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com
***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/threads.html
***************************************************************************
|