OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] What are the characteristics of a good type systemfor XML?

[ Lists Home | Date Index | Thread Index ]

Heylas, Andrew,

On Tue, 13 May 2003 04:21:16 EDT
AndrewWatt2000@aol.com wrote:
> In a message dated 13/05/2003 04:37:12 GMT Daylight Time, 
> amyzing@talsever.com writes:
> > Slogan version: complete, consistent, comprehensible.
> 
> <disclaimer>Comments from here on are very much the late night musings
> of a neophyte.</disclaimer>
> 
> My late night list of headline wants was:
> 
> 1. Easy to understand
> 2. Practical
> 3. Modular / Layered
> 4. Facilities to derive new types

I'm not quite certain what "practical" means in context.  Perhaps "most
commonly used types are already defined"?

> > Consistent: the system must be rules based.  The rules must
> > logically follow one another.  A good start might be to restrict the
> > characterization of types to that relevant to XML (that is, to the
> > realm of data transmission and storage, excluding data
> > manipulation).
> 
> Isn't that possibly favouring one use case over another?

I don't *think* so.  I could be wrong, of course.

>   (I also
> > think that a good start is to make xmlstring the ur-type, such that
> > all"primitive" types are constrained from that starting point;
> > clearly a debatable position)
> 
> Interesting. In my late night whiteboard ruminations I got to the same
> spot. Fundamentally all XML data is string data (no surprise there)
> but to treat it as fundamentally string data and work from there is,
> seen from a W3C XML Schema perspective, a little radical perhaps.

It seems to me that the core XML 1.0 spec provides a definition of
base-xml-string validation, in its well-formedness constraints for text
and attribute nodes.  That is, base type validation is equivalent to
well-formedness for text and attribute nodes.

My sense is that when the spec authors begin talking about "value
space", then the discussion may already strayed out of XML's yard and
onto a busy street.  Is it important to be able to manipulate types? 
Sure.  Is it something that XML can do?  No.  But as Bob Foster points
out elsewhere in this thread, it is something that transformation and
query languages can do.

Perhaps this means more modularization in type definitions: simple
definitions for validation/equality, plus more detail on how this can be
plugged into [some language that allows type libraries to be plugged
in].

> Of course, I guess there could be those that consider that anything
> that can be grasped in 30 minutes is nowhere near "sophisticated"
> enough. :)

*laugh*  Fish on them, then.

> > I suppose I could provide a still-more-detailed version, but I'd
> > rather pursue my own agenda (is that full disclosure, or the
> > magician's wave distracting attention?).
> 
> If I knew what you were trying to say, I could respond. :)

Just a warning that I was about to derail the discussion more toward my
own ends than directly to the goal of responding to your questions. 
Either I was admitting it, or by saying that I was doing so, I was
distracting attention from my digression.  I'm not sure which, myself.

> > From my perspective: the base type is xmlstring, a sequence of a
> > subset of unicode characters (excluding C1, C0 except HT CR LF, the
> > character'<', and the character '&' except as the start of an escape
> > sequence). 
> 
> So, you are basically applying a regular expression to a string? 

Well, you certainly *could* do that as a regular expression (which is a
form of algorithm), but I don't see why you'd *have* to.  You could also
do it with BNF (as in XML 1.0 spec), or even do a full unicode lookup
table, decorating each position with a boolean "permitted" or "not
permitted".

> Fundamentally isn't that what types in string-based data are about?
> Or, in W3C XML Schema jargon, isn't that what the lexical space is
> about?

We clearly need more than just lexical space in order to do equality and
comparison testing.  For the base xmlstring, though, lex is all.

> > I would suggest that a standard set of derivations (similar to, but
> > perhaps a little more systematic than W3C XML Schema's 'facets'),
> > and a standard set of combinators (please more than the extremely
> > weak 'list' defined by W3C XML Schema) would also be important, so
> > that primitive types may become derived types.
> 
> Maybe this is just another heretical thought but aren't facets just 
> tightening up the regular expression a little?

I think that facets are.  If you'd like, though, I can offer an example
of the use of the concepts of "atom" and "composed types" (primitive
types composed from atoms) to show more clearly what I mean.  You can
use such a thing, for instance, to provide a single type that can
substitute for W3C XML Schema's dateTime, date, gYearMonth, and gYear
types (all the "time period" types), but is more expressive.  However,
defining recurring periods (unifying time, gMonth, gMonthDay, and
whatever the other privileged one is) requires setting up a
composed-from-atoms abstract base type, and then defining certain sorts
of special derivations that can be applied to it to create concrete
types (and in so doing, you end up realizing just how incredibly
poverty-stricken the recurrent dates in WXS are, btw).

> Now, if we had named and hierarchical regular expressions .... :)

I think that chasing the regex hare may be misleading you from the
algorithmically defined types fox hunt.

> It's interesting to explore where W3C "requirements" come from.
> Somehow they often tend to be rather complete and firm when they are
> first made available for public discussion.

Actually, for W3C XML Schema, many of the requirements were perfectly
understandable.  They were building on work that had gone before, so
they *had* to be able to do certain things.  DTD types: required.  As
much expressiveness as XDR: required.  Some form of inheritance, such as
defined in SOX: required.

I think the term is "second system effect."  That's what I see, looking
at WXS.  I think that we can gain valuable lessons from discussing it,
and deciding *what* *went* *wrong*.  That is, I'm not out to needle
folks for the fun of it.  I want to see if we can't figure out a way to
define types in a more minimalist and more complete fashion by examining
the problems that those who tried before ended up stuck with.

> I think we agree on some points. For example, that W3C XML Schema
> datatypes are not a long-term solution, in part because of poor
> layering / modularisation in design.

Yup.

> I guess we are taking slightly different pragmatic approaches to XSLT
> / XPath / XQuery. I see them as "happening anyway", unless the
> "revolution" gathers a 

Oh, I think so too.  I think it *might* be possible to convince the
committee that it ought to abandon the complete reliance on WXS by
pointing out some possible solutions, but given the work that the
working group has already put in, I suspect that opening such an
architectural issue would be nearly impossible, at this point.  Just bad
timing, maybe.  RNG provided a different way to think about types (even
if some of us were fed up with WXS types before, we had great difficulty
in expressing why, or what an alternative might look like; the
pluggability of types in RNG is a broadening of the horizons).  But
XPath/XSLT/XQuery already had a path laid out, and is more interested in
solving the particular problems found on that path than in finding a
different path (even if the alternate path is shorter).

Amy!
-- 
Amelia A. Lewis                    amyzing {at} talsever.com
Boxing is a lot like ballet, except that they don't dance, there isn't
any music, and they hit each other.




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS