[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] BOM and encodings questions
- From: richard@inf.ed.ac.uk (Richard Tobin)
- To: xml-dev@lists.xml.org
- Date: Thu, 8 Mar 2007 22:01:12 +0000 (GMT)
In article <B546C312A37C12438A22154026CDC7E011ED9B16@exchfive.olympus.f5net.com> you write:
>If an XML document starts with the FF FE BOM (UTF-16, little endian) but
>the encoding is set to "UTF-8" in the prolog, what is the expected
>behavior of the Parser?
The BOM says that the document is in UTF-16. If it isn't in UTF-16,
then it's broken at the encoding level, and this is a fatal error.
If it *is* in UTF-16, the encoding declaration is wrong. This is a fatal
error unless there was some external indication (e.g. from HTTP) that
the document is supposed to be in UTF-16.
>I think that the parser should respect the BOM, read the prolog assuming
>it is encoded in UTF-16 little endian and then process the remaining of
>the XML document in UTF-8 as the prolog says.
No. XML entities must be in a single encoding. (The spec doesn't say
this explicitly, but it is clear that that's what's intended.)
>Is an XML parser expected to process a document in alternating
>encodings? I mean, is there a way to signal the parser that from a
>certain point on the encoding changes to some other encoding? If so,
>how?
An XML document can be made up of multiple entities which may have
different encodings. There's no way to mix encodings in a single
entity.
>Is there a way to express the expected encoding of the XML document in
>the XML Schema? If so, how?
No, the schema is applied after parsing the document.
-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]