[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-8 BOM
- From: David Brownell <firstname.lastname@example.org>
- To: email@example.com
- Date: Mon, 09 Jul 2001 12:18:45 -0700
I happened to notice that canonical XML would also need to change
if a UTF-8 BOM becomes legal. From section 2.1 of that spec
* The XPath data model represents data using UCS characters. Implementations
* MUST use XML processors that support UTF-8 and UTF-16 and translate to
* the UCS character domain. For UTF-16, the leading byte order mark is treated
* as an artifact of encoding and stripped from the UCS character data (subsequent
* zero width non-breaking spaces appearing within the UTF-16 data are not
* removed) [UTF-16, Section 3.2].
I'd have to look at XPath to see if that would need revision too.
----- Original Message -----
From: "Richard Tobin" <firstname.lastname@example.org>
Sent: Wednesday, July 04, 2001 4:00 AM
Subject: Re: UTF-8 BOM
> >| Except that a UTF-8 BOM isn't really a new feature; it's just one that all
> >| too many implementors overlook.
> >It's a new feature added by E105, in the last batch of "errata" before the
> >2nd edition spec was published. That was called a "clarification", but it
> Actually it was added by E44 (http://www.w3.org/XML/xml-19980210-errata#E44)
> which is described as "substantive". E105 just amended that.
> E44 was back in the days of the XML Syntax Group (before the XML Core
> Group existed). You'd have to go back through their archives to work
> out whether they realised that the non-normative text that they were
> adding didn't correspond to anything in the normative part of the
> -- Richard
> The xml-dev list is sponsored by XML.org, an initiative of OASIS
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To unsubscribe from this elist send a message with the single word
> "unsubscribe" in the body to: email@example.com