Re: [xml-dev] Your XML documents may use different sets of characters,d

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] Your XML documents may use different sets of characters,depending on which implementer you select?

From: David Carlisle <davidc@nag.co.uk>
To: "Costello, Roger L." <costello@mitre.org>
Date: Tue, 17 May 2011 16:09:11 +0100

On 17/05/2011 15:49, Costello, Roger L. wrote:
> Hi Folks,
>
> Excellent feedback!  Thanks!
>
> Here's a summary of what I've learned. Please tell me where I err.
>
> The following statements apply to "data" not to "markup" (i.e.,
> element names, attribute names).
>
> 1. Except for unpaired surrogate codepoints and a few control
> characters, you can use any character you want in XML documents.
>
> 2. The characters don't have to be defined in the Unicode
> specification.
>
> 3. For characters that don't have a visual representation or aren't
> in the Unicode character set, you can use them  via XML's character
> entity mechanism, e.g.,&#xffed;

#xffed does not use XML's entity mechanism, it is a numeric character 
reference not an entity reference. Also the whether you enter a 
character directly or via a reference is totally unrelated to whether it 
has a unicode description or visual representation.

>
> 4. Implementers of XML applications are free to choose which version
> of Unicode they will support. Thus, one implementer of an XML Schema
> validator may choose to support Unicode 2.0, while another
> implementer of an XML Schema validator may choose to support Unicode
> 2.1. One implementer of an XSLT processor may choose to support
> Unicode 2.0, while another implementer of an XSLT processor may
> choose to support Unicode 2.1.

It may be user choice or configurable rather than implementer choice and 
since we're at Unicode 6 by now one would hope that they don't force 
version 2 but...

>
> 5. In XML applications that use regular expressions (e.g. XML Schema,
> XSLT), be careful about using regexes that contain regex categories
> such as Nd. The characters in those regex categories may vary
> depending on which version of Unicode an implementer supports. Thus,
> your application may execute without errors with one vendor's tool
> and fail on another.

It may do that anyway.
>
> 6. CREPDL is a technology that allows you to precisely define the
> universe of characters that you want to allow in your XML documents.
>

I don't think CREPDL helps here at its regexp syntax is explicitly 
copied from XSD's and so character classes depend on Unicode in the same 
way. In fact I don't just think that. the crepdl spec says that explicitly:

Case 2: The semantics of regular expressions depends on the Unicode
version. Different conformant CREPDL
processors may behave very differently. For example, one may report
"in", while another, "not-in".

> /Roger

David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________

Follow-Ups:
- Re: [xml-dev] Your XML documents may use different sets ofcharacters, depending on which implementer you select?
  - From: John Cowan <cowan@mercury.ccil.org>

References:
- Your XML documents may use different sets of characters, dependingon which implementer you select?
  - From: "Costello, Roger L." <costello@mitre.org>
- Re: [xml-dev] Your XML documents may use different sets of characters, depending on which implementer you select?
  - From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
- RE: [xml-dev] Your XML documents may use different sets of characters, depending on which implementer you select?
  - From: "Costello, Roger L." <costello@mitre.org>
- RE: [xml-dev] Your XML documents may use different sets of characters, depending on which implementer you select?
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]