XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] Your XML documents may use different sets ofcharacters, depending on which implementer you select?

On Tue, 2011-05-17 at 10:49 -0400, Costello, Roger L. wrote:
[...]
> The following statements apply to "data" not to "markup" (i.e.,
> element names, attribute names).
> 
> 1. Except for unpaired surrogate codepoints and a few control
> characters, you can use any character you want in XML documents.

In particular, codepoint 0 is not allowed.

> 2. The characters don't have to be defined in the Unicode
> specification.

The codepoints do not have to have Unicode characters associated with
them.

> 
> 3. For characters that don't have a visual representation or aren't in
> the Unicode character set, you can use them  via XML's character
> entity mechanism, e.g., ■
You can do that with any allowed character, and you can also include the
character directly.

> 
> 4. Implementers of XML applications are free to choose which version
> of Unicode they will support. Thus, one implementer of an XML Schema
> validator may choose to support Unicode 2.0, while another implementer
> of an XML Schema validator may choose to support Unicode 2.1. One
> implementer of an XSLT processor may choose to support Unicode 2.0,
> while another implementer of an XSLT processor may choose to support
> Unicode 2.1.

Or the version of Unicode understood may depend on the operating
environment, e.g. on the Java VM in use.
> 
> 5. In XML applications that use regular expressions (e.g. XML Schema,
> XSLT), be careful about using regexes that contain regex categories
> such as Nd. The characters in those regex categories may vary
> depending on which version of Unicode an implementer supports. Thus,
> your application may execute without errors with one vendor's tool and
> fail on another.

That may be what you want, it turns out.  "When our system is upgraded
our schema is ready for it"...

> 6. CREPDL is a technology that allows you to precisely define the
> universe of characters that you want to allow in your XML documents.

You can also do this with an XSD facet.

Liam


-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://www.fromoldbooks.org/
Occasional blog: http://www.barefootliam.org/
The barefoot typographer





[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS