XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Brain Teaser: Element Author is of type xsd:string, what's an illegal value of Author?

Re:
> 3. These are the characters which are not legal, regardless of whether
> XML 1.0 or XML 1.1 is used:
>
> #x0  [#xD800-#DFFF] [#xFFFF-#xFFFFF] [#x110000-#xFFFFFF] #xFFFE #xFFFF

My eyes may be deceiving me, but I think most of the range [#xFFFF-#xFFFFF] 
_is_ legal for both XML 1.0 and XML 1.1.

The little bit [#xFFFE-#xFFFF] is illegal, but the rest are in the range 
[#x10000-#x10FFFF].

So, assuming a 32 bit character to represent a Unicode char (UCS4?), I make 
the illegal ones:

#x0  [#xD800-#DFFF] #xFFFE #xFFFF [#x110000-#xFFFFFFFF]

So I think Oxygen XML is right on all counts.

Pete.
--
=============================================
Pete Cordell
Tech-Know-Ware Ltd
for XML to C++ data binding visit
http://www.tech-know-ware.com/lmx
(or http://www.xml2cpp.com)
=============================================

----- Original Message ----- 
From: "Costello, Roger L." <costello@mitre.org>
To: <xml-dev@lists.xml.org>
Sent: Friday, February 09, 2007 5:06 PM
Subject: RE: [xml-dev] Brain Teaser: Element Author is of type xsd:string, 
what's an illegal value of Author?


Thanks Michael and Noah.

I would like to summarize.


1. This is the set of legal characters in XML 1.0

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
/* excluding the surrogate blocks, FFFE, and FFFF. */


2. This is the set of legal characters in XML 1.1

[#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]  /* excluding the
surrogate blocks, FFFE, and FFFF. */


3. These are the characters which are not legal, regardless of whether
XML 1.0 or XML 1.1 is used:

#x0  [#xD800-#DFFF] [#xFFFF-#xFFFFF] [#x110000-#xFFFFFF] #xFFFE #xFFFF


4. Suppose element Author is declared to be of type xsd:string.  In
each example below the Author element has invalid content:

    <Author>&#x0;</Author>
    <Author>&#xFFFE;</Author>
    <Author>&#xFFFF0;</Author>
    <Author>&#xFFFF1;</Author>


5. I tested this with XML Spy and Oxygen XML.  Here are the results:

                                  XML Spy         Oxygen XML
---------------------------------------------------------------
    <Author>&#x0;</Author>         Valid            Invalid
    <Author>&#xFFFE;</Author>      Valid            Invalid
    <Author>&#xFFFF0;</Author>     Valid            Valid
    <Author>&#xFFFF1;</Author>     Valid            Valid

XML Spy incorrectly validates the data in all four cases.

Oxygen XML correctly validates the data in two cases, and incorrectly
validates the data in the other two cases.

Is the above summary correct?

/Roger



-----Original Message-----
From: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com]
Sent: Friday, February 09, 2007 10:48 AM
To: Michael Kay
Cc: Costello, Roger L.; xml-dev@lists.xml.org
Subject: RE: [xml-dev] Brain Teaser: Element Author is of type
xsd:string, what's an illegal value of Author?

Michael Kay writes:


> There's a W3C note on handling XML 1.1 with XML Schema:
>
> http://www.w3.org/TR/xml11schema10/
>
> and it recommends (see the last line of the note) that with that
> combination, the definitions of the built-in types should be
"stretched"
to
> accommodate the characters allowed in XML 1.1. With that strategy,
there
> will never be an invalid instance of xs:string.

Indeed.  That's about the best we could do without appearing to make a
retoactive incompatible change to Schema 1.0.  With Schema 1.1, such
incompatibilities are less of an issue.  Note that the latest public
working draft of Schema 1.1 says [1]:

"[XML Schema: Datatypes] defines some datatypes which depend on
definitions in [XML 1.1] and [XML-Namespaces 1.1]; those definitions,
and
therefore the datatypes based on them, vary between version 1.0 ([XML
1.0], [XML-Namespaces 1.0]) and version 1.1 ([XML 1.1], [XML-Namespaces

1.1]) of those specifications. In any given
schema-validity-·assessment·
episode, the choice of the 1.0 or the 1.1 definition of those datatypes
is
implementation-defined."

The working draft for Schema 1.1 Datatypes provides more details [2]:

"This specification defines some datatypes which depend on definitions
in
[XML] and [Namespaces in XML]; those definitions, and therefore the
datatypes based on them, vary between version 1.0 ([XML 1.0],
[Namespaces
in XML 1.0]) and version 1.1 ([XML], [Namespaces in XML]) of those
specifications. In any given use of this specification, the choice of
the
1.0 or the 1.1 definition of those datatypes is implementation-defined.

"Conforming implementations of this specification may provide either
the
1.1-based datatypes or the 1.0-based datatypes, or both. If both are
supported, the choice of which datatypes to use in a particular
assessment
episode should be under user control.
Note:  When this specification is used to check the datatype validity
of
XML input, implementations may provide the heuristic of using the 1.1
datatypes if the input is labeled as XML 1.1, and using the 1.0
datatypes
if the input is labeled 1.0, but this heuristic should be subject to
override by users, to support cases where users wish to accept XML 1.1
input but validate it using the 1.0 datatypes, or accept XML 1.0 input
and
validate it using the 1.1 datatypes. "

Regarding "string" in particular, the Datatypes draft says [3]:

"It is implementation-defined whether an implementation of this
specification supports the Char production from [XML], or that from
[XML
1.0], or both. See Dependencies on Other Specifications (§1.3)."

So I think we've been quite careful with the details there, and Schema
1.1
can indeed be used to validate as strings the characters that Roger
mentions.  Note that there are some characters that are not allowed in
either XML 1.0 or XML 1.1 infosets, and thus are disallowed even in
Schema
1.1 strings.  I believe I'm correct that NUL (0x0) is one of these.

Noah

[1] http://www.w3.org/TR/xmlschema11-1/#intro1.1
[2] http://www.w3.org/TR/xmlschema11-2/#intro-relatedWork
[3] http://www.w3.org/TR/xmlschema11-2/#string
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------





_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php





[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS