OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 BOM



> > This issue should perhaps become part of "XML Blueberry"
> 
> The main problem with this is that it would mean that the legality of
> the first bytes of an entity would depend on whether there was an text
> declaration following them and what version number it contained, which
> seems the wrong way round. 

Today's XML spec requires text/xml decls to be the first thing in the
document, no leading characters.

The UTF-16 BOM is explicitly (in the XML spec) not part of document
data, which is why it doesn't affect that logic.


>     It should be possible to handle a BOM at a
> level below that at which the text declaration is processed. 

Works OK for UTF-16 today ... where the BOM is explicitly not
part of the document's data, so it's never before the text/xml decl.


>     (Of
> course, this can't really be done.  If you get encoding="iso-8859-1"
> after a UTF-8 BOM there's something wrong which ought to be reported.)

I guess I'm thinking that a UTF-8 BOM would be a "new feature" that's
an error today.  Hence it fits with the other backwards-problematic stuff
in Blueberry ... though it's a "new feature" that's encoding-specific.

It's already declared to be a fatal error if the declared encoding doesn't
match the actual one.  (Not that one can always detect that case, since
those actual encodings have so many synonyms to recognize.)

- Dave