XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] US-ASCII characters versus XML characters ... whysuch a huge discrepancy?

Ken Holman wrote:

> XML Schema does not constrain character sets.

I don't understand this. If I have a pattern facet that specifies any of the 28 characters not supported by XML

    <xs:simpleType name="characters">
        <xs:restriction base="xs:string">
            <xs:pattern value="[&#1;-&#127;]*" />
        </xs:restriction>
    </xs:simpleType>

then I get this error:

    Character reference &#1; is an invalid XML character

I conclude that XML Schema does constrain character sets -- it constrains the character set to that of XML. No?

> Use XML 1.1 and you get all characters except NUL.

Do any XML Schema validators support XML 1.1?

/Roger

-----Original Message-----
From: G. Ken Holman [mailto:g.ken.holman@gmail.com] On Behalf Of G. Ken Holman
Sent: Monday, October 01, 2012 10:24 AM
To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] US-ASCII characters versus XML characters ... why such a huge discrepancy?

At 2012-10-01 13:59 +0000, Costello, Roger L. wrote:
>Below is a table that shows the US-ASCII characters (decimal value) 
>in the left column and the right column indicates whether the 
>character is allowed in XML documents.
>
>Questions:
>
>1. Why does XML not support many of the US-ASCII characters?

Because it is rooted in SGML which is a text-based data description 
language and the characters allowed by XML 1.0 are characters 
typically found in raw text files (that is, text files that do not 
have control characters for directives such as printer control).

Who needs more than CR, LF and tab when typing raw text?

>Of the 127 US-ASCII characters, 28 characters are not allowed in XML 
>documents; that is, 22% of the US-ASCII characters are not supported by XML.

Ref:

     http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Char
     http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Char

>2. I am creating an XML Schema for an RFC that allows all 127 
>US-ASCII characters.

XML Schema does not constrain character sets.

To express constraints on an XML document use:

   ISO/IEC 19757-7:2009. Information technology -- Document Schema Definition
   Languages (DSDL) -- Part 7: Character Repertoire Description Language

But CREPDL does not *add* anything to XML, it only constraints the 
characters allowed by XML.

>What should I do for the 28 US-ASCII characters that are supported 
>by the RFC but not supported by XML?

Use XML 1.1 and you get all characters except NUL.

One of my clients needed control characters in a text result in order 
to control a legacy agate printing system for sports scores in 
newspapers (the tiny print of baseball box scores for 
example).  Thankfully, NUL is not one of the control characters 
needed.  I wrote XSLT that produced all of the needed control 
characters and the results are made public here:

http://sportsmlt.svn.sourceforge.net/viewvc/sportsmlt/2.0/support/sportsmlt2-character.xsl?&view=markup

The top-most stylesheet fragment is here:

http://sportsmlt.svn.sourceforge.net/viewvc/sportsmlt/2.0/sportsmlt2.xsl?view=markup

The HTML documentation for the stylesheet library is here, produced 
by my XSLStyle documentation methodology:

http://sportsmlt.svn.sourceforge.net/viewvc/sportsmlt/2.0/sportsmlt2.html

Using the above stylesheets if I want hex 0x01 to be represented in 
my XML or my XSLT I use the entity reference &#xffd1;.  As a sequence 
of 8 characters it is not, itself, an invalid Unicode character, 
rather, it is only represented as such in memory.  The raw invalid 
Unicode character never shows up in any file which would then make 
the Unicode file invalid.

I hope this helps.

. . . . . . . . . . Ken

--
Contact us for world-wide XML consulting and instructor-led training
Free 5-hour lecture: http://www.CraneSoftwrights.com/links/udemy.htm
Crane Softwrights Ltd.            http://www.CraneSoftwrights.com/x/
G. Ken Holman                   mailto:gkholman@CraneSoftwrights.com
Google+ profile: https://plus.google.com/116832879756988317389/about
Legal business disclaimers:    http://www.CraneSoftwrights.com/legal


_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS