OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] Specifying Character Sets

[ Lists Home | Date Index | Thread Index ]

> I am working on a small schema language for an XML language that I
> will be using in an open source program. In this schema I 
> am defining
> a text data type. I want the schema developer using my 
> schema language
> to have the option of specifying the character set of the text data type.
> A given XML document is only in one character set.  To 
> support multiple 
> character sets you'll have to do something like base64-encode 
> the content.

I read the question differently (though people often use "character set" to
mean "character encoding", so I might be wrong). XML allows the Unicode
character set (or some version of it). You may want in a schema to restrict
the user to a subset of the characters in that character set, for example
the subset of characters defined in iso-8859-1, or the subset defined in
iso-8859-2, or some subset of your own choosing such as [A-Z][0-9][.,-].

There are international names for character encodings such as iso-8859-1
(search for IANA register of character sets). They define the encodings of
the characters, which you aren't interested in, but in doing so they also
define the repertoire of characters (that is, the character set in its
strict meaning).

I would think that a more useful approach, however, is to use the names of
blocks of characters defined in Unicode, which are available for use in XML
Schema regular expressions, for example <xs:pattern value="\p{IsHebrew}*"/>
limits you to characters with Unicode codepoints 590-5FF.

Michael Kay


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS