[
Lists Home |
Date Index |
Thread Index
]
- To: xml-dev@lists.xml.org
- Subject: Some random noise on rational type systems for XML
- From: Amelia A.Lewis <amyzing@talsever.com>
- Date: Tue, 6 May 2003 18:24:51 -0400
- Organization: The Mysthical World of Talsever!
The intrusion of the W3C XML Schema type system into core XPath/XSLT
struck me sufficiently to cause me to want to think about type systems
and XML again. So here's *that* old permathread again ....
I think one of the worst problems with W3C XML Schema's types is that
they do not represent a system. This leads me to ask: is there a
universal type system? Answer: apparently not. Types are an imposition
of external categories onto information, in order to make that
information more amenable to manipulation.
If that's so (you don't have to agree, of course), then there are two
criteria for evaluating a type system: completeness and
comprehensibility. To be a little more precise: a type system ought to
either fully represent common categories of data, or ought to have
mechanisms for extensions that do so: completeness. A type system ought
to have a relatively small number of primitives and a relatively small
number of rules for creating more; the rules for
derivation/extension/restriction ought to be consistent as well:
comprehensibility.
Note that I'm leaving "complex types" (structured types) out of this
discussion. W3C XML Schema and RNG both do a Pretty Good Job[tm] of
establishing a means for creation of structures. I want to focus on
"value types", the things that are represented in text and attribute
nodes in XML.
The most commonly encountered programming languages these days start
from registers, and base types on packing the largest amount of possible
information into the smallest number of bits. This is not necessarily
the best solution for XML.
First principle: the XML ur-type is "string". Everything in XML can be
represented as a string (MUST be representable as a string). It can
therefore be manipulated as a string--truncated, concatenated,
case-transformed, etc. Possibly not *meaningfully* from the perspective
of the data author, but always *possibly*. Note that "string" is
actually a subset of Unicode (which subset depends upon whether you want
XML Classic (1.0) or New XML (1.1)).
Take a quick look at W3C XML Schema, and let's let that inform some
initial discussion. Throw out the twenty-five derived types; they
should never have been normative (only the rules to derive them need to
be normative). That still leaves us nineteen. Lessee ... well,
eighteen, because we've defined string as the ur-type. Okay, drop
another seven, by collapsing all the date types into one conceptual
date. Lose another two by making double and float numbers. Combine
*binary into a single type (it can have an "encoding" attribute, which
allows the addition of things like yEnc, if you're so inclined). Drop
Notation. WTF is anyURI doing as a primitive? Clear influence of the
Church of the Holy and Universal
Thingy-that-Identifies-a-Thingy-with-Identity. Hmm, that should leave
us about six types (all of which are strings):
boolean
binary [octet-stream]
number
date
duration
Hmm. We're missing one. Ah, that's it: QName. Question: does XML need
a pointer type? Which would, of course, be represented as a string. If
so, it might include, for instance, QName, XPath expressions, and URIs.
Let's say that there's an abstract pointer, maybe.
Six types. Even I can remember that.
Now, there's an interesting thing that happens when you start passing
information around and storing it here and there. The SQL people
encountered this, and found a solution, which made them heretics in the
eyes of the relational true believers. The problem is that whenever you
have a thing that has a value, it is often useful to be able to say
"don't know" "not specified" "undefined" "null" or "nil". W3C XML
Schema introduces a mechanism for this. So, umm, why? XML already has
a way to say nothing. Say nothing. The empty string. No data. Not
specified. Presumably a schema need only specify "not nullable" to
prevent this appearing, but by default, a specification of "true|false"
as permitted values for boolean also includes "" (otherwise known as the
Pilate option).
Now, who gets to decide what's an Authentic First Class Genuine Type and
what's a Shoddy Knockoff? W3C XML Schema's answer is to set up an
authoritative agency. Not sure why; it's not the Web Way. Let a
Hundred Points of Type blossom! Implementors of validating XML parsers
can respond to user demand. "Support the sstl geographic types
library!" Or they can design the silly things so that users can plug in
validation modules. RNG already has a mechanism for specifying type
libraries.
Note also that not all the primitive types have to be actually *usable*.
We can define the base octet-stream type to be "abstract", so that it
has to have a derivation in order to know how that octet-stream is being
represented as a string.
That gets us to the point of wondering about principles for derivation
of types. If, after all, we have a generic "number" type, we prolly
*do* want some rules (that are small in number and consistent) to
specify, either in a type library definition or in a schema instance,
that the number MUST have a range that fits into (coincidentally) a
sixteen-bit register using ones-complement notation.
Heh. But this is already too long, and besides, I *enjoy*
cliff-hangers, so let's just Tune In Next Week for Another Bland Episode
....
Amy!
--
Amelia A. Lewis amyzing {at} talsever.com
So what is love then? Is it dictated or chosen? Does it sing like the
hymns of a thousand years or is it just pop emotion? And if it ever was
here and it left does it mean it was never true?
-- Emily Saliers
|