OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Character �



And, of course, not to forget why � is not allowed.  XML is a text notation by
which text can be encoded for sending over the WWW.  � is a NULL
character which, if used in C/C++ strings for example, will represent the
end of a string.  All the control characters which are used for data transmission
and control purposes (e.g. ^S, ^Q, ^Z, ^C) are inappropriate for the obvious reasons.

If you did allow control characters, then you would need yet another layer to encode 
the XML for transmission (i.e. XML would no longer be able to be sent as MIME type
text/XML.)

Why not just allow the numeric character reference and ban the direct character?
Well, that would change the nature of character references: at the moment
you can always convert a character reference into a direct character and back
(I mean characters in normal data content.)   

So NULLs are binary data not text. To represent binary data in XML you need
to encode it. XML Schema's Datatypes provides two binary types to help with
this. 

Cheers
Rick Jelliffe