OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Your XML documents may use different sets of characters,depending on which implementer you select?

On 17/05/2011 15:49, Costello, Roger L. wrote:
> Hi Folks,
> Excellent feedback!  Thanks!
> Here's a summary of what I've learned. Please tell me where I err.
> The following statements apply to "data" not to "markup" (i.e.,
> element names, attribute names).
> 1. Except for unpaired surrogate codepoints and a few control
> characters, you can use any character you want in XML documents.
> 2. The characters don't have to be defined in the Unicode
> specification.
> 3. For characters that don't have a visual representation or aren't
> in the Unicode character set, you can use them  via XML's character
> entity mechanism, e.g.,■

#xffed does not use XML's entity mechanism, it is a numeric character 
reference not an entity reference. Also the whether you enter a 
character directly or via a reference is totally unrelated to whether it 
has a unicode description or visual representation.

> 4. Implementers of XML applications are free to choose which version
> of Unicode they will support. Thus, one implementer of an XML Schema
> validator may choose to support Unicode 2.0, while another
> implementer of an XML Schema validator may choose to support Unicode
> 2.1. One implementer of an XSLT processor may choose to support
> Unicode 2.0, while another implementer of an XSLT processor may
> choose to support Unicode 2.1.

It may be user choice or configurable rather than implementer choice and 
since we're at Unicode 6 by now one would hope that they don't force 
version 2 but...

> 5. In XML applications that use regular expressions (e.g. XML Schema,
> XSLT), be careful about using regexes that contain regex categories
> such as Nd. The characters in those regex categories may vary
> depending on which version of Unicode an implementer supports. Thus,
> your application may execute without errors with one vendor's tool
> and fail on another.

It may do that anyway.
> 6. CREPDL is a technology that allows you to precisely define the
> universe of characters that you want to allow in your XML documents.

I don't think CREPDL helps here at its regexp syntax is explicitly 
copied from XSD's and so character classes depend on Unicode in the same 
way. In fact I don't just think that. the crepdl spec says that explicitly:

Case 2: The semantics of regular expressions depends on the Unicode
version. Different conformant CREPDL
processors may behave very differently. For example, one may report
"in", while another, "not-in".

> /Roger


The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS