[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-8 BOM
- From: David Brownell <david-b@pacbell.net>
- To: Richard Tobin <richard@cogsci.ed.ac.uk>, xml-dev@lists.xml.org
- Date: Tue, 03 Jul 2001 07:26:43 -0700
> > This issue should perhaps become part of "XML Blueberry"
>
> The main problem with this is that it would mean that the legality of
> the first bytes of an entity would depend on whether there was an text
> declaration following them and what version number it contained, which
> seems the wrong way round.
Today's XML spec requires text/xml decls to be the first thing in the
document, no leading characters.
The UTF-16 BOM is explicitly (in the XML spec) not part of document
data, which is why it doesn't affect that logic.
> It should be possible to handle a BOM at a
> level below that at which the text declaration is processed.
Works OK for UTF-16 today ... where the BOM is explicitly not
part of the document's data, so it's never before the text/xml decl.
> (Of
> course, this can't really be done. If you get encoding="iso-8859-1"
> after a UTF-8 BOM there's something wrong which ought to be reported.)
I guess I'm thinking that a UTF-8 BOM would be a "new feature" that's
an error today. Hence it fits with the other backwards-problematic stuff
in Blueberry ... though it's a "new feature" that's encoding-specific.
It's already declared to be a fatal error if the declared encoding doesn't
match the actual one. (Not that one can always detect that case, since
those actual encodings have so many synonyms to recognize.)
- Dave