OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 BOM



 From: "Tim Bray" <tbray@textuality.com>

> Actually, I think that the UTF-8 BOM is a deeply stupid idea that
> serves no useful purpose in any imaginable universe.  We wouldn't
> be thinking about were it not for the fact that MS Notepad happens
> to write one for UTF-8 documents.

Yes.

I think what we are seeing a clarification in layering.  XML started
with various kinds of errors (WF, validity, "for compability", etc.)

Things like UTF-8 BOMs belong in entity management (like line-feed
handling, transcoding, and Unicode normalization) that  should be
as transparent to XML as possible. XML does really well in this regard: the
XML-in-MIME RFCS and the use of Unicode have served us well I think.

One of the nice things about  hierarchical markup is that it reduces the
times when character indexing and line counting is useful or significant.
So we don't need to freak out if a dumb transcoder makes a character out of
a signal (as in the BOM case) as we might have to character- or
byte-indexing was the basis of our systems.

Cheers
Rick Jelliffe