OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Document encodings

Yes. There are a succession of features looked at, one after another until a
fixed result is determined.

 1) EXTERNAL: Information sent in the MIME header
 2) BOM: Presence or absense of Byte Order Mark (BOM) which is a
Unicode signal that allows you to know if you are using
16 or 32 bit characters, and the "endianness"
 3) FAMILY SIGNATURE: Presence of expected codes at the beginning of the
file (enough to know whether 8 bit codes are used, and
if they are ASCII-based or EBCDIC-based) for "<?xml"
 4) ENCODING: knowing the family signature is enough to read
the encoding parameter of the XML header.
 5) DEFAULT: otherwise UTF-8 (which also encompasses ASCII)

The important thing is that this is not guesswork. There is no scope for
one parser determining one encoding and another parser determining another
encoding: all XML processors should be able to say "Yes I can handle this
entity" or "no I cannot handle this entity".

All processors are required to support UTF-8 and UTF-16 encodings.

There are some character sets which have some instability about them:
see http://www.w3.org/TR/japanese-xml/  but this is an exception.

Rick Jelliffe

----- Original Message -----
From: "Phil Ruelle" <philr@iplbath.com>
To: <xml-dev@lists.xml.org>
Sent: Friday, 6 July 2001 PM 04:16
Subject: Document encodings

> A quick question:
> How do parsers work out what encoding an XML document is in
> (i.e. how is it able to read the 'encoding' attribute of the
> declaration)?
> I'm guessing that all the encodings XML supports have a common
> 'root' so the XML declaration can always be read using the 'base'
> character set. Is this correct or am I way off the mark?
> Many thanks,
> Phil Ruelle
> ------------------------------------------------------------------
> The xml-dev list is sponsored by XML.org, an initiative of OASIS
> <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To unsubscribe from this elist send a message with the single word
> "unsubscribe" in the body to: xml-dev-request@lists.xml.org