[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 BOM

From: David Brownell <david-b@pacbell.net>
To: Richard Tobin <richard@cogsci.ed.ac.uk>, xml-dev@lists.xml.org
Date: Tue, 03 Jul 2001 07:26:43 -0700

> > This issue should perhaps become part of "XML Blueberry"
> 
> The main problem with this is that it would mean that the legality of
> the first bytes of an entity would depend on whether there was an text
> declaration following them and what version number it contained, which
> seems the wrong way round. 

Today's XML spec requires text/xml decls to be the first thing in the
document, no leading characters.

The UTF-16 BOM is explicitly (in the XML spec) not part of document
data, which is why it doesn't affect that logic.


>     It should be possible to handle a BOM at a
> level below that at which the text declaration is processed. 

Works OK for UTF-16 today ... where the BOM is explicitly not
part of the document's data, so it's never before the text/xml decl.


>     (Of
> course, this can't really be done.  If you get encoding="iso-8859-1"
> after a UTF-8 BOM there's something wrong which ought to be reported.)

I guess I'm thinking that a UTF-8 BOM would be a "new feature" that's
an error today.  Hence it fits with the other backwards-problematic stuff
in Blueberry ... though it's a "new feature" that's encoding-specific.

It's already declared to be a fatal error if the declared encoding doesn't
match the actual one.  (Not that one can always detect that case, since
those actual encodings have so many synonyms to recognize.)

- Dave

References:
- Re: UTF-8 BOM
  - From: Richard Tobin <richard@cogsci.ed.ac.uk>

Prev by Date: How to use groups with mixed content models
Next by Date: RE: How to use groups with mixed content models
Previous by thread: Re: UTF-8 BOM
Next by thread: RE: UTF-8 BOM
Index(es):
- Date
- Thread