[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 BOM

From: Rick Jelliffe <ricko@allette.com.au>
To: xml-dev@lists.xml.org
Date: Wed, 04 Jul 2001 19:38:03 +0800

 From: "Tim Bray" <tbray@textuality.com>

> Actually, I think that the UTF-8 BOM is a deeply stupid idea that
> serves no useful purpose in any imaginable universe.  We wouldn't
> be thinking about were it not for the fact that MS Notepad happens
> to write one for UTF-8 documents.

Yes.

I think what we are seeing a clarification in layering.  XML started
with various kinds of errors (WF, validity, "for compability", etc.)

Things like UTF-8 BOMs belong in entity management (like line-feed
handling, transcoding, and Unicode normalization) that  should be
as transparent to XML as possible. XML does really well in this regard: the
XML-in-MIME RFCS and the use of Unicode have served us well I think.

One of the nice things about  hierarchical markup is that it reduces the
times when character indexing and line counting is useful or significant.
So we don't need to freak out if a dumb transcoder makes a character out of
a signal (as in the BOM case) as we might have to character- or
byte-indexing was the basis of our systems.

Cheers
Rick Jelliffe

References:
- RE: UTF-8 BOM
  - From: Michael Brennan <Michael_Brennan@allegis.com>
- Re: UTF-8 BOM
  - From: Tim Bray <tbray@textuality.com>

Prev by Date: Re: How to use groups with mixed content models
Next by Date: RE: XML Linking 1.0 and XML Base become W3C Recommendations
Previous by thread: Re: UTF-8 BOM
Next by thread: [ANNOUNCE] QuiP: Software AG's XQuery Prototype
Index(es):
- Date
- Thread