xml-dev - expat and encodings

expat and encodings

[ Lists Home | Date Index | Thread Index ]

From: "Steve Kearon" <stevek@fineline-software.co.uk>
To: <xml-dev@ic.ac.uk>
Date: Tue, 15 Dec 1998 10:07:56 -0000

Can someone clarify the issue of character encodings for me - I think this
is an expat issue, but it may be a more general thing.

I'm trying to save/load text that might contain accented characters (>127).
Running on Windows95. I realise that when writing XML, I either have to
convert such characters to "&#xxx;" form, or note that the file format
encoding is "iso-8859-1", otherwise the XML parser (expat)objects when
subsequently reading the file.

The snag is that whether the file has utf-8 or iso-8859-1 encoding, the text
the application receives from the parser seems to be always utf-8. I've
tried specifying "iso-8859-1" as the encoding to the XML_CreateParser()
call, but this seems to have no effect (I guess the parameter actually
overrides the default (rtf-8) file encoding, rather than specifying the
encoding the client would like to see).

The questions...
Is my understanding correct - does expat feed UTF-8 text to clients when
parsing?
Can expat be asked to feed clients iso-8859-1?
If the client must convert manually, are there any helper functions in
expat/xmltok?
If I use the unicode build of expat, does it feed utf-8, unicode or utf-16?

Many thanks,
Steve Kearon
FineLine Software



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

Follow-Ups:
- Re: expat and encodings
  - From: James Clark <jjc@jclark.com>

Prev by Date: RE: Notations
Next by Date: Re: expat and encodings
Previous by thread: Managing Names and Ontologies
Next by thread: Re: expat and encodings
Index(es):
- Date
- Thread