Re: [xml-dev] BOM and encodings questions

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Philippe Poulard <Philippe.Poulard@sophia.inria.fr>
To: Shlomo Yona <S.Yona@F5.com>
Date: Thu, 08 Mar 2007 18:22:09 +0100

Shlomo Yona wrote:
> .1.
> 
> If an XML document starts with the FF FE BOM (UTF-16, little endian) but 
> the encoding is set to “UTF-8” in the prolog, what is the expected 
> behavior of the Parser?
> 
> I think that the parser should respect the BOM, read the prolog assuming 
> it is encoded in UTF-16 little endian and then process the remaining of 
> the XML document in UTF-8 as the prolog says.
> 
> Is this correct?

I'm not sure, but a BOM can't be used with UTF-8, so the parser should 
fail to decode the prolog, as the characters expected should be UTF-16 
encoded : "<?xml " would be interpreted as 3 characters

> 
> .2.
> 
> Is an XML parser expected to process a document in alternating 
> encodings? I mean, is there a way to signal the parser that from a 
> certain point on the encoding changes to some other encoding? If so, how?

the only case I know is with external entities : each can have its own 
encoding that may be different from the document's one

> 
> .3.
> 
> Is there a way to express the expected encoding of the XML document in 
> the XML Schema? If so, how?

too late : XML Schema works at the logical level

I don't know why you try to enforce an incoming document to be encoding 
with a given one, let the parser do the job and fail normally if it is 
not supported

However, a SAX parser can supply informations about the encoding of a 
document, so you can write a filter like this :

if encoding != THE_ENCODING
then fail_for_an_obscure_reason()
endif

-- 
Cordialement,

               ///
              (. .)
  --------ooO--(_)--Ooo--------
|      Philippe Poulard       |
  -----------------------------
  http://reflex.gforge.inria.fr/
        Have the RefleX !

Follow-Ups:
- Re: [xml-dev] BOM and encodings questions
  - From: "Pete Cordell" <petexmldev@tech-know-ware.com>
- RE: [xml-dev] BOM and encodings questions
  - From: "Shlomo Yona" <S.Yona@F5.com>

References:
- BOM and encodings questions
  - From: "Shlomo Yona" <S.Yona@F5.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]