[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 BOM

From: David Brownell <david-b@pacbell.net>
To: Tim Bray <tbray@textuality.com>, Rob Lugt <roblugt@elcel.com>,Michael Brennan <Michael_Brennan@allegis.com>, xml-dev@lists.xml.org
Date: Tue, 03 Jul 2001 19:49:50 -0700

||  == David Brownell
|   == Michael Brennan
|
|| it fits with the other backwards-problematic stuff
|| in Blueberry ... though it's a "new feature" that's encoding-specific.
| 
| Except that a UTF-8 BOM isn't really a new feature; it's just one that all
| too many implementors overlook.

It's a new feature added by E105, in the last batch of "errata" before the
2nd edition spec was published.  That was called a "clarification", but it
seems to me like a substantive change ... previously, BOM was discussed
exclusively (!!) in the UTF-16 context.  Though as Rob Lugt pointed out,
the relevant normative text (4.3.3) is unchanged.

That part of E105 sure seem like it matches the "Blueberry" goals
of aligning with Unicode, not "2nd edition" goals of removing (not
adding :) ambiguity.


>    == Tim Bray
> 
> Actually, I think that the UTF-8 BOM is a deeply stupid idea that
> serves no useful purpose in any imaginable universe.

That's where I'm coming from.  UTF-8 is the default encoding, and
the only way un-MIME-typed data would NOT be in UTF-8 is if
it has a UTF-16 BOM, or an XML (or text) declaration.  This change
wasn't necessary; thrashing infrastructure is bad (unless maybe you're
a company needing a stick to force customers to buy new software :).


>      We wouldn't
> be thinking about were it not for the fact that MS Notepad happens
> to write one for UTF-8 documents.

So what's the next desired erratum ... somewhere in 4.3.3, it should
get updated to say that "for interoperability" any (real) UTF may have
a BOM?  Whereas right now it only says that UTF-16 "must" have
one, and requires otherwise that xml (or text) decls must appear
"at the beginning" (that is, where such a BOM could now be)?

- Dave

Follow-Ups:
- Re: UTF-8 BOM
  - From: Richard Tobin <richard@cogsci.ed.ac.uk>

References:
- RE: UTF-8 BOM
  - From: Michael Brennan <Michael_Brennan@allegis.com>
- Re: UTF-8 BOM
  - From: Tim Bray <tbray@textuality.com>

Prev by Date: Re: redefine
Next by Date: Re: [Question] How to do incremental parsing?
Previous by thread: Re: UTF-8 BOM
Next by thread: Re: UTF-8 BOM
Index(es):
- Date
- Thread