OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

URIs and information typing



Using namespace-qualified identifiers (QNames) for type identification 
seems to introduce some significant difficulties while only saving a few 
keystrokes.  This proposal suggests using bare URIs rather than QNames to 
improve interoperability and extensibility.

[I've long been a critic of the (lack of) URI structure, notably on XML-URI 
last summer and on various IETF lists.  While I still have plenty of 
reservations about URI structure and syntax, the basic idea is more and 
more intriguing, and I'm probably going to have to eat a few of my past 
words in making this proposal.]

At present, the typing mechanism in W3C XML Schema is both extremely 
extensible and deeply constrained.  W3C XML Schema Datatypes [1] provides a 
family of primitive datatypes and mechanisms for extending them through 
facets for defining atomic types, while W3C XML Schema Structures [2] 
allows developers to create molecules from these sets of atoms.

Types, whether built-in or created by the designer, are assigned names 
which are referenced with namespace-qualified names (type="QName"). Types 
have a URI component, which application must derive from the namespace 
declarations in the document.  They also have a local name, separate from 
the URI component, which identifies the particular type in the list of 
types associated with that namespace URI.  Prefixes are used as an 
abbreviation mechanism.

This creates a number of interesting problems for XML Schemas on a number 
of levels.  The first problem is caused by the use of namespace prefixes 
within attribute values, which requires applications to maintain additional 
information about prefix-namespace mapping.  This is certainly allowed by 
the Namespaces in XML spec [3], but is an extension of the capability 
provided there and this support isn't entirely "natural" to some views of 
the namespace specification.

The second problem may not appear to be a problem when type structures are 
viewed entirely within the context of W3C XML Schema.  Definining a type 
requires the use of W3C XML Schema syntax, and the inclusion of that 
declaration within the schema in order that both its namespace URI and it's 
local name can be assimilated with the larger schema.  This creates a 
barrier to other schema approaches which choose to rely on W3C XML Schema 
Datatypes for convenience and interoperability reasons.

RELAX [4], for instance, uses W3C XML Schema Datatypes within RELAX 
descriptions, but restricts users to the built-in types defined within that 
specification.  This allows RELAX developers to focus on RELAX, without 
having to harness RELAX implementions to W3C XML Schema implementations 
which can process W3C XML Schema type declarations.  It also allows RELAX 
to avoid the URI+local name issues involved in W3C XML Schema processing, 
as it relies solely on the name portion of the datatypes.

Although RELAX has chosen the (human-friendly) approach of relying on the 
names of built-in datatypes, I'd like to suggest that a slightly different 
approach might be simpler, far more extensible, and still workable.  Rather 
than rely on a combination of a namespace URI and a local name to identify 
types, the use of a bare URI would allow processors to include data typing 
information created in a number of different frameworks without mandating 
the use of a particular syntax for information type definition.

For example, I might create a datatype defining a 'simonSKU' identified by 
the URI http://simonstl.com/dt/simonSKU. At that location I'd have a RDDL 
[5] document, which would provide a human-readable description as well as 
links to a W3C XML Schema definition of the data type, perhaps a Perl 
regular expression which can be used to check my SKU, a Java class which 
can be used to check it, etc. There could also be some RDF around 
describing relationships between this type and other types, or additional 
properties of the type like creator, projects in which it's used, etc.

It would be my responsibility to make sure all of these things worked 
consistently, of course (and maybe a testing resource in RDDL would be 
cool), but applications could use my datatype processing as appropriate, 
and humans could have a full set of documentation as well.

I'm well-aware that this approach would involve potentially substantial 
changes in both W3C XML Schema and RELAX to implement, so I'm not exactly 
expecting it to happen.  (RDF Schema [6] already uses a similar URI-based 
approach.) It may well have been considered and rejected at a prior 
date.  I suspect it isn't necessary to meet the requirements of W3C XML 
Schema within its own worldview, but might simplify the implementation of 
certain aspects of W3C XML Schema and provide future extensibility in new 
directions.

Also, URIs could point quite easily to locations within a single W3C XML 
Schema document - this doesn't require schema fragmentation, so long as 
only a single processing context is needed.

This approach might also simplify future projects which handle type 
information as metadata, not necessarily as part of a validation process.

[1] - XML Schema Part 2: Datatypes 
(http://w3.org/TR/2000/CR-xmlschema-1-20001024/)
[2] - XML Schema Part 1: Structures 
(http://w3.org/TR/2000/CR-xmlschema-1-20001024/)
[3] - Namespaces in XML (http://w3.org/TR/1999/REC-xml-names-19990114)
[4] - Regular Language Expressions (http://www.xml.gr.jp/relax/)
[5] - Resource Directory Description Language (http://www.rddl.org)
[6] - Resource Description Framework Schema 
(http://w3.org/TR/2000/CR-rdf-schema-20000327)

Simon St.Laurent
Associate Editor
O'Reilly and Associates