OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: expat and encodings

[ Lists Home | Date Index | Thread Index ]
  • From: Clark Cooper <coopercl@sch.ge.com>
  • To: stevek@fineline-software.co.uk
  • Date: Wed, 16 Dec 1998 09:00:59 -0500 (EST)

Expat always returns to your handlers XML_Char strings, which will be
either UTF-8, UTF-16 encoded as wchar_ts, or UTF-16 encoded as unsigned shorts,
depending on the definition of XML_UNICODE and XML_UNICODE_WCHAR_T when
you compile the library and your program. It provides no option to change
the encoding of strings you receive. This is a wise design choice, since
only the application should know what to do with characters that don't
map from Unicode to the encoding you want to receive. (Even if the document
is in ISO-8859-1, it can contain character references (&#x203e;) or references
to external entities that are in a different encoding.)

You may force the encoding recognized by providing a non-null encoding
name string to XML_ParserCreate. Normally, however, you should pass it a
NULL pointer so that it will recognize and use the XML encoding declaration.

If you were using perl and the XML::Parser perl module built on top of expat,
I could recommend one of the Unicode modules at CPAN (Comprehensive Perl
Archive Network) to help you map from UTF-8 to whatever. Even if you aren't
using perl, you can download one of these to see how to build your own
C function to do encoding mapping.

Clark Cooper    Logic Technology Inc.		cccooper@ltionline.com
(518) 385-8380  650 Franklin St., Suite 304	coopercl@sch.ge.com
		Schenectady,  NY 12305

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS