xml-dev - Re: [xml-dev] Datatypes - it's in the contract

Re: [xml-dev] Datatypes - it's in the contract

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Datatypes - it's in the contract
From: Mike Champion <mc@xegesis.org>
Date: Sat, 28 Sep 2002 16:28:38 -0400
In-reply-to: <5.1.0.14.0.20020928074203.022ebec8@ncmail.datadirect-technologies.com>

9/28/2002 7:50:50 AM, Jonathan Robie <jonathan.robie@datadirect-technologies.com> wrote:

>
>If the applications that use this data require data of the appropriate 
>type, and we want validation to be able to determine whether the contract 
>is being followed, then we have to allow data types to be declared.

I think there are a number of problems with taking such types all that
seriously for real *XML*-centric applications, even accepting the 
(pretty reasonable!) argument that the schema should define a contract
between producers and consumers of data.   (I wouldn't quarrel with using 
types extensively in OO programming languages, nor in exploiting SQL types
in SQL-centric programs; I simply think that XML has other use cases and
design patterns than these technologies.  Disagree?  That's another thread!)

First, a schema that handled your example data in a truly useful way
would be non-trivial at best (or some non-trivial code would be needed
to pre-preprocess data to meet it).

Think of instances such as
<person>
<ssn>123-456-789</ssn>
<name>THX-1135</name>
<children>3.0</children>
</person> 

<person>
<ssn>123 456    789</ssn>
<name>[none of your business]</name>
<children>three</children>
</person>

Second, think of data that simply can't be validated by syntax.  For example:

<prime-number-public-key>120349812304897210349876786238746</prime-number-public-key>
<customer-id>666-1313-0000<customer-id>

Ain't no way a schema validator is going to enforce the contract that those be
valid prime numbers, customer-ids., etc.  If some procedural code has to be invoked
to do that anyway, how much more trouble is it to have the procedural code check
to see that the syntax is correct ... or to write validation code that doesn't worry
about variant syntaxes such as 6661313000 or 666 1313 000 or 6 6 6 1 3 1 3 0 0 0 0
ad infinitum... not to mention "six six six one three one three zero zero zero"
If that data comes from humans, &deity; only knows how many creative ways people can
find to enter meaningful but syntactically invalid data, and I for one would find it
vastly easier to write code to validate a reasonable range of these than to 
put this stuff in an XML schema.  Ultimately, a human is going to have to look
at the input in some significant percentage of the cases, and systems designers
have to figure out where to draw the line beween trying to write code (either procedural
or declarative queries/schems) to handle the weird cases and simply punting to 
a human.  (Ahem, the option of "we don't want your money until you enter the data to
our exacting standards" appeals to nerds a LOT more than it appeals to Pointy Haired
Bosses!).

So sure, a "contract" specifying the format of data is useful (more useful for design,
negotiation, and debugging purposes than for run-time validation IMHO).  Doing as
much as feasible at the syntactic level with regular expressions / schema / etc. makes
a lot of sense in *many* circumstances, so sure, people should be encouraged to 
use these features WHEN they solve their problems "out of the box."  But in 
many (most?) real-world situations there is no XML formalism to to define the
contractual constraints appropriately, and the contract must include natural 
language descriptions, references to mathematical concepts ("primeness"), 
database relationships ("the customer id must exist in the database, the
customer record it identifies must match the information supplied in the order").

The complaint, basically, is that a vastly disproportionate amount of the W3C's effort
has been spent moving from what would be an "80%" solution (roughly what one can
do with RELAX NG, perhaps) to a "90%" solution (maybe 95% ... let's not quibble ...
it's very significantly under 100%).  This relatively small increase in the actual
practical effectiveness of the strongly-typed approach over a more weakly-typed
approach does not justify, in the opinion of many who post here, the immense 
amount of complexity it has added to WXS and XQuery, the difficulty that has
caused implementers and end users, not to mention the years added to the time it
takes to get the specs to Recommendation status.

So, few would disagree that "it's in the contract".  Lots would disagree that
the amount of effort/complexity added to XML++ to validate the "contract" with
schema-based mechanisms is worth the cost.

Follow-Ups:
- Re: [xml-dev] Datatypes - it's in the contract
  - From: Jonathan Robie <jonathan.robie@datadirect-technologies.com>
- Re: [xml-dev] Datatypes - it's in the contract
  - From: "Rick Jelliffe" <ricko@allette.com.au>

References:
- Datatypes - it's in the contract
  - From: Jonathan Robie <jonathan.robie@datadirect-technologies.com>

Prev by Date: Re: [xml-dev] limits of the generic
Next by Date: Re: [xml-dev] limits of the generic
Previous by thread: Re: [xml-dev] Notations for datatypes
Next by thread: Re: [xml-dev] Datatypes - it's in the contract
Index(es):
- Date
- Thread