[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-8 BOM
- From: Rick Jelliffe <email@example.com>
- To: firstname.lastname@example.org
- Date: Wed, 04 Jul 2001 19:38:03 +0800
From: "Tim Bray" <email@example.com>
> Actually, I think that the UTF-8 BOM is a deeply stupid idea that
> serves no useful purpose in any imaginable universe. We wouldn't
> be thinking about were it not for the fact that MS Notepad happens
> to write one for UTF-8 documents.
I think what we are seeing a clarification in layering. XML started
with various kinds of errors (WF, validity, "for compability", etc.)
Things like UTF-8 BOMs belong in entity management (like line-feed
handling, transcoding, and Unicode normalization) that should be
as transparent to XML as possible. XML does really well in this regard: the
XML-in-MIME RFCS and the use of Unicode have served us well I think.
One of the nice things about hierarchical markup is that it reduces the
times when character indexing and line counting is useful or significant.
So we don't need to freak out if a dumb transcoder makes a character out of
a signal (as in the BOM case) as we might have to character- or
byte-indexing was the basis of our systems.