[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Your XML documents may use different sets of characters,depending on which implementer you select?
- From: David Carlisle <davidc@nag.co.uk>
- To: "Costello, Roger L." <costello@mitre.org>
- Date: Tue, 17 May 2011 16:09:11 +0100
On 17/05/2011 15:49, Costello, Roger L. wrote:
> Hi Folks,
>
> Excellent feedback! Thanks!
>
> Here's a summary of what I've learned. Please tell me where I err.
>
> The following statements apply to "data" not to "markup" (i.e.,
> element names, attribute names).
>
> 1. Except for unpaired surrogate codepoints and a few control
> characters, you can use any character you want in XML documents.
>
> 2. The characters don't have to be defined in the Unicode
> specification.
>
> 3. For characters that don't have a visual representation or aren't
> in the Unicode character set, you can use them via XML's character
> entity mechanism, e.g.,■
#xffed does not use XML's entity mechanism, it is a numeric character
reference not an entity reference. Also the whether you enter a
character directly or via a reference is totally unrelated to whether it
has a unicode description or visual representation.
>
> 4. Implementers of XML applications are free to choose which version
> of Unicode they will support. Thus, one implementer of an XML Schema
> validator may choose to support Unicode 2.0, while another
> implementer of an XML Schema validator may choose to support Unicode
> 2.1. One implementer of an XSLT processor may choose to support
> Unicode 2.0, while another implementer of an XSLT processor may
> choose to support Unicode 2.1.
It may be user choice or configurable rather than implementer choice and
since we're at Unicode 6 by now one would hope that they don't force
version 2 but...
>
> 5. In XML applications that use regular expressions (e.g. XML Schema,
> XSLT), be careful about using regexes that contain regex categories
> such as Nd. The characters in those regex categories may vary
> depending on which version of Unicode an implementer supports. Thus,
> your application may execute without errors with one vendor's tool
> and fail on another.
It may do that anyway.
>
> 6. CREPDL is a technology that allows you to precisely define the
> universe of characters that you want to allow in your XML documents.
>
I don't think CREPDL helps here at its regexp syntax is explicitly
copied from XSD's and so character classes depend on Unicode in the same
way. In fact I don't just think that. the crepdl spec says that explicitly:
Case 2: The semantics of regular expressions depends on the Unicode
version. Different conformant CREPDL
processors may behave very differently. For example, one may report
"in", while another, "not-in".
> /Roger
David
________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.
________________________________________________________________________
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]