OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: detecting character set of an XML doc

[ Lists Home | Date Index | Thread Index ]
  • From: "Matthew Sergeant (EML)" <Matthew.Sergeant@eml.ericsson.se>
  • To: "'Dirk Germonpre'" <dirkg@tectrade.be>, xml-dev@ic.ac.uk
  • Date: Thu, 17 Jun 1999 12:44:34 +0200

> -----Original Message-----
> From:	Dirk Germonpre [SMTP:dirkg@tectrade.be]
> Hello,
> If I'm writing an XML tool, how can I detect what character set is used
> for
> an XML document? I've read that for UTF-16, an encoding signature (xFEFF)
> is used at the beginning of the document. Is there a different encoding
> signature for each character set? 
	Not each "character set" - but each encoding, yes. They all exhibit
different binary signatures, otherwise they would be the same... :)

	> If so, where can I find documentation on this?

	You can start with the XML spec, appendix B (I think it's B)
contains some brief information. Or John Cowan posted a C decoder which
detected the most common ones, or you can find my perl Apache module on
CPAN, which detects the character sets and encoding.


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS