[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 BOM

From: David Brownell <david-b@pacbell.net>
To: xml-dev@lists.xml.org
Date: Mon, 09 Jul 2001 12:18:45 -0700

I happened to notice that canonical XML would also need to change
if a UTF-8 BOM becomes legal.   From section 2.1 of that spec
(http://www.w3.org/TR/xml-c14n):

* The XPath data model represents data using UCS characters. Implementations
* MUST use XML processors that support UTF-8 and UTF-16 and translate to
* the UCS character domain. For UTF-16, the leading byte order mark is treated
* as an artifact of encoding and stripped from the UCS character data (subsequent
* zero width non-breaking spaces appearing within the UTF-16 data are not
* removed) [UTF-16, Section 3.2].

I'd have to look at XPath to see if that would need revision too.

- Dvae


----- Original Message ----- 
From: "Richard Tobin" <richard@cogsci.ed.ac.uk>
To: <xml-dev@lists.xml.org>
Sent: Wednesday, July 04, 2001 4:00 AM
Subject: Re: UTF-8 BOM


> >| Except that a UTF-8 BOM isn't really a new feature; it's just one that all
> >| too many implementors overlook.
> 
> >It's a new feature added by E105, in the last batch of "errata" before the
> >2nd edition spec was published.  That was called a "clarification", but it
> 
> Actually it was added by E44 (http://www.w3.org/XML/xml-19980210-errata#E44)
> which is described as "substantive".  E105 just amended that.
> 
> E44 was back in the days of the XML Syntax Group (before the XML Core
> Group existed).  You'd have to go back through their archives to work
> out whether they realised that the non-normative text that they were
> adding didn't correspond to anything in the normative part of the
> spec.
> 
> -- Richard
> 
> ------------------------------------------------------------------
> The xml-dev list is sponsored by XML.org, an initiative of OASIS
> <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To unsubscribe from this elist send a message with the single word
> "unsubscribe" in the body to: xml-dev-request@lists.xml.org

References:
- Re: UTF-8 BOM
  - From: Richard Tobin <richard@cogsci.ed.ac.uk>

Prev by Date: re: Presumption of XML's Stability (was RE: XML Blueberry (non-ASCII
Next by Date: SAX2 ... missing features?
Previous by thread: Re: UTF-8 BOM
Next by thread: Re: UTF-8 BOM
Index(es):
- Date
- Thread