OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] What are the characteristics of a good type system for XML

[ Lists Home | Date Index | Thread Index ]

In a message dated 13/05/2003 04:37:12 GMT Daylight Time, amyzing@talsever.com writes:

On Mon, 12 May 2003 16:19:13 EDT
AndrewWatt2000@aol.com wrote:
>I guess it is fairly easy to complain about the deficiencies of W3C
>XML Schema datatypes.
>
>What characteristics would a good type system for XML have?

Hmm.  You liked the song so much that you want an encore?  Or you don't
like my inability to carry a tune and so don't listen?  *laugh*


Hi Amy,

No nothing particularly like that. I just wanted to ask a question with a clean sheet.

My approach when I see problems like the one currently under discussion is sometimes to go back to square one and try to get a better handle on the set of problems that we are trying to solve and ask what we are trying to do. Maybe you took that as read.

At the moment <ignorance_admission>I am not totally convinced about clarity of definition of the set of problems that we (collectively) are trying to solve.</ignorance_admission>

When I see comments like those from a member of the Working Group that the Atlantic has almost been crossed I, in my quiet heretical way, allow the question "And how long have you imagined that setting sail across the Atlantic would get you to India?" to waft through my mind. Columbus arrived at somewhere interesting but it wasn't where he thought he was going.


Slogan version: complete, consistent, comprehensible.



<disclaimer>Comments from here on are very much the late night musings of a neophyte.</disclaimer>

My late night list of headline wants was:

1. Easy to understand
2. Practical
3. Modular / Layered
4. Facilities to derive new types


Definition list version:

Complete: the system must be able to represent the types that XML
authors want to use.  It should not privilege certain use cases over
others without providing an escape to allow XML processors to change the
privileges.


Yep. Modular / layered.

  That is, it should provide a mechanism for defining new

"primitive" types.

Yep. My point 4.

Consistent: the system must be rules based.  The rules must logically
follow one another.  A good start might be to restrict the
characterization of types to that relevant to XML (that is, to the realm
of data transmission and storage, excluding data manipulation).


Isn't that possibly favouring one use case over another?

  (I also

think that a good start is to make xmlstring the ur-type, such that all
"primitive" types are constrained from that starting point; clearly a
debatable position)


Interesting. In my late night whiteboard ruminations I got to the same spot. Fundamentally all XML data is string data (no surprise there) but to treat it as fundamentally string data and work from there is, seen from a W3C XML Schema perspective, a little radical perhaps.


Comprehensible: the system must be amenable to short explanation.  If it
takes more than half an hour, a whiteboard, and audience of greater than
average attention and intelligence, then it isn't comprehensible.


Agreed. You very well may not get it all across in 30 minutes, but you would certainly want that "Ah, I see what this is about" moment in that time frame.

Of course, I guess there could be those that consider that anything that can be grasped in 30 minutes is nowhere near "sophisticated" enough. :)



I suppose I could provide a still-more-detailed version, but I'd rather
pursue my own agenda (is that full disclosure, or the magician's wave
distracting attention?).


If I knew what you were trying to say, I could respond. :)

From my perspective: the base type is xmlstring, a sequence of a subset
of unicode characters (excluding C1, C0 except HT CR LF, the character
'<', and the character '&' except as the start of an escape sequence).



So, you are basically applying a regular expression to a string?

Fundamentally isn't that what types in string-based data are about? Or, in W3C XML Schema jargon, isn't that what the lexical space is about?

Base type validation is equivalent to well-formedness checking for text
nodes, then.


This "base type validation" is in the lexical space?

The rules for defining a "primitive" type (which further

constrains the valid lexical space defined by xmlstring) is to define
the algorithm used for validation.  A primitive type definition SHOULD
also include the algorithm preferred for comparison/collation (or
possibly a choice of algorithms).

This gives enough to define type libraries, RNG style, and to do some
basic work on the types defined (to sort, that is).

The further definition of types defines permitted operations on two
instances of the same type and what each operation does, and coercion of
one type to another.  I would suggest that both of these areas are out
of scope for pure-XML type definition (though clearly of interest to
something like XQuery, and arguably of interest to XPath and XSLT).
This suggests that it might be nice to have a pluggable model for types
in XPath, XSLT, and XQuery, since the definition of "completeness"
implies that a simple authoritative list ("Types that Amy thinks are
Important enough to Include"; substitute random authoritative body for
'Amy' and see if it makes you feel better) is not the best solution.


Agreed. That comes under my modular / layered heading.

If
your list doesn't include provision for the geographical location types
that Simon needs, expect Simon to ignore your types, whether system or
collection.  Multiply by all the people interested in areas that aren't
on your radar.


Yes, that's why some modularisation / layering is, in my view, the only likely long-term solution.

I would suggest that a standard set of derivations (similar to, but
perhaps a little more systematic than W3C XML Schema's 'facets'), and a
standard set of combinators (please more than the extremely weak 'list'
defined by W3C XML Schema) would also be important, so that primitive
types may become derived types.


Maybe this is just another heretical thought but aren't facets just tightening up the regular expression a little?

Now, if we had named and hierarchical regular expressions .... :)

  Or perhaps the system would just define

how derivation and combination algorithms may be published and
identified.

Note: this set of requirements is radically different from the
requirements that the W3C XML Schema working group operated under.


Fair comment.

They
had to support DTD types.  No such requirement here.  It'll get defined
if it's important.  They had to support types common in common
programming languages and in common non-XML data storage formats.  No
such requirement here.  It'll get defined if it's important, and the
mapping will be up to the users of the language or the folks writing the
bridge to the non-XML format.


It's interesting to explore where W3C "requirements" come from. Somehow they often tend to be rather complete and firm when they are first made available for public discussion.

  The requirements here are complete[able],

consistent, comprehensible.  They give a possibility of a long-term
solution,


I agree that that is what is needed.

I can't help wondering if when W3Columbus set off across the W3C XML Schema Atlantic they genuinely expected to reach India. Just as the historical Columbus didn't quite get his orientation right on his intrepid voyage I can't help wondering if W3C XML Schema will take us somewhere novel and interesting but fail to reach the intended goal. To reach India we will, after a period of excitement at having discovered a New (W3C XML Schema) World, have to set off in a different direction altogether.

> and if implemented in a way that allows real completeness

Which implies a better layered, more modular design than W3C XML Schema datatypes exhibit.

(which implies that almost anyone can publish a type library, and the
alleged invisible hand of the "market" will straighten it out), provide
a means to *discover* a standard before setting it in stone.

Amy!


I think we agree on some points. For example, that W3C XML Schema datatypes are not a long-term solution, in part because of poor layering / modularisation in design.

I guess we are taking slightly different pragmatic approaches to XSLT / XPath / XQuery. I see them as "happening anyway", unless the "revolution" gathers a momentum that I don't expect. The new specs have, for me, lots of positive aspects and I can thole W3C XML Schema, while at the same time expecting it to be a "temporary" solution. Of course, things that are viewed as temporary can last a very long time. Henry VIII broke off from the Catholic Church so he could get remarried, and that "temporary" situation still persists. :) I wonder why things theological popped into mind when W3C XML Schema is being discussed? <grin/>

Andrew Watt




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS