Lists Home |
Date Index |
- From: John Cowan <firstname.lastname@example.org>
- To: XML Dev <email@example.com>
- Date: Mon, 01 Mar 1999 14:09:46 -0500
Timothaeus Bray scripsit:
> [D]id you know the BOM was legal in UTF-8?
The BOM isn't just a BOM, it's also the ZWNBSP (zero-width
non-breaking space; no, I do not know how to pronounce that
acronym) character, and is interpreted as a BOM only at the
beginning of UCS-2 or UTF-16 documents. Not to worry; the character is
as near to a no-op as Unicode allows for.
> And of course by the fact that Unicode/10646 is a moving target.
Only sort of. 8859-1 is theoretically a moving target too, except
that all the slots are full; CP 1252 is a moving target that has
just moved (by adding the euro at 0x80). In all these cases, characters
can be added (in principle) but not moved or deleted (any more).
> In practice,
> I've never actually seen anything outside of the BMP, but the
> experts agree they're showing up real soon now.
Not until Unicode 4.0, unless someone wants to use the private-use
planes 15 and 16.
> How to get it in? Something like 𐌳 I expect.
Exactly so. Or the decimal NCR equivalent. Two NCRs representing
the surrogates separately would be erroneous by both Unicode/10646
definitions and XML definitions.
John Cowan http://www.ccil.org/~cowan firstname.lastname@example.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:email@example.com
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:firstname.lastname@example.org the following message;
To subscribe to the digests, mailto:email@example.com the following message;
List coordinator, Henry Rzepa (mailto:firstname.lastname@example.org)