OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: UTF-8 BOM



> From: David Brownell [mailto:david-b@pacbell.net]

<snip/>

> I guess I'm thinking that a UTF-8 BOM would be a "new feature" that's
> an error today.  Hence it fits with the other 
> backwards-problematic stuff
> in Blueberry ... though it's a "new feature" that's encoding-specific.

Except that a UTF-8 BOM isn't really a new feature; it's just one that all
too many implementors overlook.

The XML 1.0 specification includes a non-normative appendix regarding
autodetection of character encodings. It quite explicitly mentions the UTF-8
BOM as one of the things a processor should look for
(http://www.w3.org/TR/2000/REC-xml-20001006#sec-guessing-no-ext-info).
Unlike the issue with Blueberry, this isn't something new that's been added
to Unicode since XML 1.0. It's just a failure of current implementations.