xml-dev - RE: [xml-dev] W3C's five new XQuery/Xpath2 working drafts

RE: [xml-dev] W3C's five new XQuery/Xpath2 working drafts - Still miss

[ Lists Home | Date Index | Thread Index ]

To: "Champion, Mike" <Mike.Champion@SoftwareAG-USA.com>, xml-dev@lists.xml.org
Subject: RE: [xml-dev] W3C's five new XQuery/Xpath2 working drafts - Still missing Updates
From: Jonathan Robie <jonathan.robie@softwareag.com>
Date: Thu, 27 Dec 2001 09:52:06 -0500
In-reply-to: <9A4FC925410C024792B85198DF1E97E4021BD6C0@softwareag.com>

At 05:20 PM 12/26/2001 -0700, Champion, Mike wrote:
>Ahhh ... I see the argument now.  But this assumes not only a strongly-typed
>programming language (and the last I checked, a very substantial percentage
>of Web programming is done in Javascript, Perl, Python, etc.), but also a
>schema-centric conception of the role of XML in web software.

XQuery is a strongly typed query language, and its type system is based on 
XML Schema. Even if you use a strongly typed programming language like 
Java, its type system is *not* based on the XML schema, and a lot of errors 
can occur when converting between the type systems. The idea is to do your 
XML to XML transformations - and eventually updates - directly in XML, 
using only one type system.

This gives you true XML-centric programming, and the same kind of type 
safety for XML that Java provides for its native type system. If one of 
your program contains an error, and it replaces a customer with a coffee 
cup, this error should be caught if the schema for an invoice does not 
allow a coffee cup where a customer should occur. That should depend on the 
schema, not on having bug-free programs. And it is not enough to give the 
programmer the ability to validate if desired -- in buggy programs, people 
have often forgotten to do something that would have prevented the 
destruction of your data.

>The "schema"
>might be an RDBMS DDL description of tables, class definitions in C++ or
>Java, or one of the XML schema languages, but in any event the only way to
>catch programming errors with type systems is to put "types" at the very
>core of a software architecture.  That is one (good, and well-established)
>way of doing it, but not by any means the only, or even the dominant (AFAIK)
>way of doing it.

I think the dominant way of processing XML is to use XML as a transport 
mechanism, and use "real" systems like databases to guard mission-critical 
data. And one of the reasons that XML is relegated to transport is the 
"Perl and duct tape" approach used to program with XML.

Type safety and real XML-centric programming are needed to put XML at the 
core of a software architecture. In this approach, everything looks like a 
nail. To the XML view, an RDBMS data dictionary is just an XML schema, and 
the tables are simply XML data. Everything in between is done by very 
intelligent data integration software.

>As the "T-word" thread here a few months ago
>http://lists.xml.org/archives/xml-dev/200108/msg01042.html indicated, "type"
>is a bit of a dirty word in some perfectly respectable circles. I am
>struggling with the concepts and terminology here (I took a stab at this in
>my post http://lists.xml.org/archives/xml-dev/200112/msg00356.html and
>didn't get much followup).  The various distinctions among classes, types,
>and domains make my head hurt, and I don't claim to have any deep insights.

I am very much afraid that I could ruin my vacation by getting too deeply 
involved in this debate, but let me try a quick stab. I reserve the right 
to duck out if it gets too much traffic ;->

If we have named descriptions of a set of instances, and instances of those 
descriptions associated with the names, I think we have types. In other 
words, DTDs or schemas, plus XML instances, already give us types. The 
process of testing whether the instances are valid members of these 
descriptions is called validation.

However, XML does not have a type system. Since XML was not designed for 
manipulation, we do not have a type system that describes the result of 
various operations on an XML instance. However, XQuery is all about 
operations on XML instances, so we need to understand these operations in 
terms of the underlying XML types. That means we need a type system.

We haven't been very systematic about types in the XML world. As a result, 
everyone has their own conception of types in XML, and it gets increasingly 
difficult to say anything about types as each word we might want to use 
gets reserved for some purpose in some community. Most people confuse the 
XML types with types in their favorite system - eg relational or object 
oriented. The RDF and XML communities also think of types somewhat 
differently. So the whole discussion of types in XML has been confusing and 
frustrating for many people, and different communities have had a hard time 
communicating because of differing fundamental assumptions about how types 
work. I think we need greater clarity on types in XML, and I hope this will 
be one result of the XML Query work. I personally think that RELAX-NG and 
the XML Schema Formal Description have also contributed to clarity in these 
areas.

Is it valid to multiply an integer times an element whose content is an 
integer? Is it valid to multiply an integer times an element whose content 
is a string? Is it valid to multiply an integer times an element whose 
content is a URL? If we have introduced a system that manipulates instances 
of types, we need a type system that tells us which operations are valid on 
a particular instance.

If I perform a query on an XML message to create XML content to be stored 
in a database, a type system can guarantee that the result will conform to 
the schema of the database. I think we need to have that kind of guarantee. 
Whether or not we have a type system, we do have types, and if the wrong 
kind of data is generated, we hope it will be rejected. When possible, I 
would like to have that guarantee statically so that I know it will 
*always* conform to the schema of the database, because testing is never 
exhaustive, and my tests just might miss important cases that cause 
problems. This is always advantageous, but it is especially advantageous 
when we add updates to XQuery.

>Nevertheless, there is an  alternative conception of the role of XML in
>software development that puts well-formed but informally-described  XML
>documents or messages deep in the architecture rather than treating XML as a
>serialization syntax for objects.  In this approach, XPath and XSLT can be
>seen simply as convenient ways of adding or extracting information from
>chunks of XML. I see that XPath 2.0 now supports FLWR expressions, and hence
>joins, the lack of which was  the major limitation of XPath 1.0 for anything
>I ever tried to do.   I'd be inclined to say that XPath 2.0 is enough to
>"declare victory" so that the W3C can focus on some sort of XML update
>syntax, and then XQuery as currently defined.

XPath 2.0 already contains the vast majority of XQuery. The main 
differences involve element construction and strong typing.

But queries also need to be able to construct instances, especially if you 
have updates. Suppose I want to add a new book to a bibliography, before an 
existing book. I would like to be able to do an update like this:

update
   let $b := document("data/xmp-data.xml")//book[title="TCP/IP Illustrated"]
   insert
         <book year="1997">
             <title>Java in a Nutshell</title>
                 <author><last>Flanagan</last><first>David</first></author>
                 <publisher>O'Reilly</publisher>
                 <price>29.95</price>
         </book>
   before $b

Without element construction, we can't do that. Also, I need element 
constructors when I replace existing data:

update
   let $b := document("data/xmp-data.xml")//book[title="TCP/IP Illustrated"]
   replace $b
   with
         <book year="1997">
               <title>Java in a Nutshell</title>
             <author><last>Flanagan</last><first>David</first></author>
               <publisher>O'Reilly</publisher>
               <price>29.95</price>
         </book>

So we need element constructors, which are the major dividing line between 
XQuery and XPath 2.0.

Now suppose this document is governed by a schema. Should this update be 
allowed if the new element does not conform to the schema? If the schema 
specifies default attributes, should the new instance contain those 
attributes? What is the type information associated with the new instance?

If this update is really modifying your mission-critical data, I think you 
probably want to ensure that updates are not creating invalid data. If you 
are just using XML as a transport, then you can use XPath 2.0 to identify 
nodes, and modify them with whatever tools you prefer.

>I'd be very interested in a reality check -- Am I the only XML developer
>still living in the loosely-typed or non-typed Dark Ages?  Does anyone else
>see XPath 2.0 as meeting the most pressing real-world business requirements
>that the XQuery folks have been working on?

Since XPath 2.0 is largely the same language, containing most of XQuery, 
with significant overlap among the editors and generated from the same 
grammar, I don't think we should view this as competition to XQuery. So 
your question is basically whether updates require element construction or 
a type system. I have addressed this above.

>What percentage of real-world
>XML programming  errors can caught by the XQuery type system?

Having written quite a few queries, and several function libraries, I would 
say that a significant number of errors can be caught by the type system. 
An XML Schema defines structure and data types at roughly the same level as 
the data dictionary of a relational database. I think that most people who 
have programmed relational databases find that the errors caught by the 
database management system, based on the data dictionary, are significant.

Jonathan

Follow-Ups:
- Re: [xml-dev] W3C's five new XQuery/Xpath2 working drafts - Stillmissing Updates
  - From: Paul T <pault12@pacbell.net>
- Re: [xml-dev] W3C's five new XQuery/Xpath2 working drafts - Still missing Updates
  - From: "Dare Obasanjo" <kpako@yahoo.com>

Prev by Date: Re: [xml-dev] XML "tuple spaces" alpha technology demonstrated
Next by Date: Re: [xml-dev] XML "tuple spaces" alpha technology demonstrated
Previous by thread: RE: [xml-dev] W3C's five new XQuery/Xpath2 working drafts - Still missing Updates
Next by thread: Re: [xml-dev] W3C's five new XQuery/Xpath2 working drafts - Still missing Updates
Index(es):
- Date
- Thread