xml-dev - Re: [xml-dev] limits of the generic

Re: [xml-dev] limits of the generic

[ Lists Home | Date Index | Thread Index ]

To: Jeni Tennison <jeni@jenitennison.com>
Subject: Re: [xml-dev] limits of the generic
From: Uche Ogbuji <uche.ogbuji@fourthought.com>
Date: Sat, 28 Sep 2002 14:05:46 -0600
Cc: Jonathan Robie <jonathan.robie@datadirect-technologies.com>, xml-dev@lists.xml.org
In-reply-to: Message from Jeni Tennison <jeni@jenitennison.com> of "Sat, 28 Sep 2002 17:18:27 BST." <197177143298.20020928171827@jenitennison.com>
Sender: uche.ogbuji@fourthought.com

> Uche,
> 
> > It's neither baroque nor difficult for implementors either. The part
> > of XPath 1.0 data typing that has the most coercion rules is the
> > definition of EqualityExpression (i.e. interpretation of "=" and
> > "!=".) The entire implementation of this is less than 100 lines of
> > Python code, and very clear Python code, I think. In addition, about
> > half of of this code is shared with the implementation of
> > RelationalExpression.
> >
> > I would guess that the equivalent implementation in XPath 2.0 would
> > run well past 1000 lines of Python code. This is just part of the
> > reason why I don't plan to implement XPath 2.0. The net gain for
> > such staggering complexity is frightfully minimal. Jonathan's
> > examples, batting as hard as he can for data types, underscore this
> > all too well.
> 
> I do agree with you that XPath 2.0's data type handling leaves much to
> be desired, and that many of the data types that are supported in
> XPath 2.0 (because they're built-in to W3C XML Schema) are not
> particularly useful for the transformations that we've been used to.
> 
> However, I don't think that means that data types are altogether
> useless.

I didn't say they are.  They are useful in *certain* portions of the 
processing pipeline.  This is why I advocate that data typing occupy a 
separate layer than the basics of XPath.  This is easily done through 
declarations which create axioms that can be processed separately.  I have 
long wanted to propose an EXSLT module for generic constraint processing, that 
would accomodate data type processing.  I'll see what I can do to finally hack 
out some time to do so.

Data types are fine.  They just don't belong in the core.


> Examples that spring to mind are:
> 
> 1. How to compare whether two elements have the same name. In XPath
>    1.0 you have to compare the local-name() and namespace-uri() to get
>    a namespace-aware comparison. In XPath 2.0, because the names are
>    QNames, you can just compare the QNames.

This has nothing to do with data typing, IMO.  The idea of expanded name 
equivalence predates WXS, and is already omnipresent in XPath 1.0 (i.e. name 
tests in the form of qnames).  The fact tthat this cannot be done in 
EqualityExpressions is an oversight that can be corrected without dragging in 
all of WXS data types.  Just add a function expanded-name() (abbreviate it, 
perhaps), that takes an optional QName or a node set.  If it's given a QName, 
that's what it expands.  If it's given a node set, it uses the expanded ame of 
the first docorder item of that node set.   If used without an argument, "." 
is assumed.  Then you can simply do:

expanded-name() = ns:foo

in your expressions, and the problem is solved.  no?

Actually, I'd say such an exsl:expanded-name() would be a handy addition to 
the EXSLT core module.


> 2. How to compare whether two date-times are the same when they use
>    different timezones. In XPath 1.0 this would involve some serious
>    work -- you'd have to bug out to XSLT for it. In XPath 2.0, because
>    dateTimes have their own data type, you can just compare them.

EXSLT date-times module provides for this without needing to infect the core.


> 3. Similarly, how to compare whether two durations are the same.

I think the subsequent discussion of this one closes the case  ;-)

I can't help musing: Jonathan called XPath 1.0 Baroque.  I actually would call 
it Attic, and one of the most Attic technologies I've used lately (even more 
so than Python, which lies between Doric and Ionic).  The situation I see with 
xs:duration is not even baroque or byzantine, nor is it saracen.  These would 
imply order even in the complexity.  It's more Surrealistic.  Maybe inspired 
by Frank Lloyd Wright?  (I hope no one from Michigan is lurking about  ;-) )


> You can derive from these examples that I consider the data typing to
> be most useful for structured data types rather than for those that
> could be compared as strings if the canonical lexical representation
> were used.

And I see all the cases you mention better handled by explicit checks rather 
than built-in coercion rules.


> Also, I do think that someone who's gone to the trouble of creating a
> W3C XML Schema schema for their markup language is going to expect
> that the data types they specified within their schema for the
> elements/attributes will be used in the document, so that they won't
> have to do:
> 
>   <xsl:for-each select="item">
>     <xsl:sort select="@num" data-type="number" />
>                             ^^^^^^^^^^^^^^^^^^
>     ...
>   </xsl:for-each>

But there is a middle ground.  Just one possible way to spell it:

  <xsl:for-each select="item">
    <xsl:sort select="@num" data-type="schema-derived()" />
    ...
  </xsl:for-each>

Which has the nice advantage that the user can also do:

  <xsl:for-each select="item">
    <xsl:sort select="@num" data-type="schema-derived() or 'number'" />
    ...
  </xsl:for-each>

Menaing "use the schema-derived type if available, and fall back to number if 
it isn't".

Can't you see how blindly relying on the schema type makes an ugly dependency 
between the XSLT and the instance document?  By handling it declarative, as I 
suggest, the above example is *much* safer for modularization and reuse.  This 
is the advantage of generic approaches.


> if they've already specified that the 'num' attribute is an integer
> within the schema. This seems the most persuasive argument for
> including support for W3C XML Schema data types in XPath 2.0.

Doesn't persuade me one bit.



> But it would be great if we could reduce the complexity of the
> data-type support in XPath 2.0 as well. Do you have any suggestions
> about which parts are the most difficult to implement and how they
> might be made simpler?

This seems obvious to me.  The fact that I have to support all the comparison 
rules for all the WXS data types in order to implement EqualityExpr is more 
than enough to bloat the Python function 10x.  They can be simpler by not 
being there at all.  Why do I have to support one committee's arbitrary motley 
of data types that may have nothing to do with any conceivable needs of my 
users?

At least using generic approaches, I can plug in the details in a modular 
fashion, as they are called for.


-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Apache 2.0 API - http://www-106.ibm.com/developerworks/linux/library/l-apache/
Python&XML column: Tour of Python/XML - http://www.xml.com/pub/a/2002/09/18/py.
html
Python/Web Services column: xmlrpclib - http://www-106.ibm.com/developerworks/w
ebservices/library/ws-pyth10.html

Follow-Ups:
- OT: Frank Lloyd Wright (was Re: [xml-dev] limits of the generic)
  - From: Matt Gushee <mgushee@havenrock.com>
- Re: [xml-dev] limits of the generic
  - From: Arjun Ray <aray@nyct.net>
- Re: [xml-dev] limits of the generic
  - From: Jeni Tennison <jeni@jenitennison.com>

References:
- Re: [xml-dev] limits of the generic
  - From: Jeni Tennison <jeni@jenitennison.com>

Prev by Date: Re: [xml-dev] limits of the generic
Next by Date: Re: [xml-dev] limits of the generic
Previous by thread: Re: [xml-dev] limits of the generic
Next by thread: Re: [xml-dev] limits of the generic
Index(es):
- Date
- Thread