OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML Blueberry

At 5:18 PM +0800 6/24/01, Rick Jelliffe wrote:
>  From: "John Cowan" <jcowan@reutershealth.com>
>>  All Unicode 3.1 code points, including the unassigned ones, are already
>>  part of the XML document character set.  (The trivial exceptions are
>>  most of the C0 control characters, the surrogate space, and U+FFFE/FF.)
>>  The issue here is the implicit NAMECHAR and NMSTCHAR declarations,
>>  if I remember my SGML 8-letterisms correctly.
>The XML Second Edition only references Unicode 3.0, not 3.1.
>According to the Unicode.org site, Unicode 3.1 "adds a large number of coded

All true. Nonetheless, XML 1.0 (as well as XML 1.0 second edition) 
does allow Unicode 3.1 characters in element content and attribute 
values as well as all characters that may be define din the future 
and as well as code points that may never be defined. Production 2 is 
normative here:

[2]    Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | 
/* any Unicode character, excluding the surrogate blocks, FFFE, and 
FFFF. */

Note especially [#x10000-#x10FFFF]. That's all the characters past 
the basic multilingual plane.

| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
|          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
|              http://www.ibiblio.org/xml/books/bible2/              |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |