Lists Home |
Date Index |
9/28/2002 7:50:50 AM, Jonathan Robie <email@example.com> wrote:
>If the applications that use this data require data of the appropriate
>type, and we want validation to be able to determine whether the contract
>is being followed, then we have to allow data types to be declared.
I think there are a number of problems with taking such types all that
seriously for real *XML*-centric applications, even accepting the
(pretty reasonable!) argument that the schema should define a contract
between producers and consumers of data. (I wouldn't quarrel with using
types extensively in OO programming languages, nor in exploiting SQL types
in SQL-centric programs; I simply think that XML has other use cases and
design patterns than these technologies. Disagree? That's another thread!)
First, a schema that handled your example data in a truly useful way
would be non-trivial at best (or some non-trivial code would be needed
to pre-preprocess data to meet it).
Think of instances such as
<ssn>123 456 789</ssn>
<name>[none of your business]</name>
Second, think of data that simply can't be validated by syntax. For example:
Ain't no way a schema validator is going to enforce the contract that those be
valid prime numbers, customer-ids., etc. If some procedural code has to be invoked
to do that anyway, how much more trouble is it to have the procedural code check
to see that the syntax is correct ... or to write validation code that doesn't worry
about variant syntaxes such as 6661313000 or 666 1313 000 or 6 6 6 1 3 1 3 0 0 0 0
ad infinitum... not to mention "six six six one three one three zero zero zero"
If that data comes from humans, &deity; only knows how many creative ways people can
find to enter meaningful but syntactically invalid data, and I for one would find it
vastly easier to write code to validate a reasonable range of these than to
put this stuff in an XML schema. Ultimately, a human is going to have to look
at the input in some significant percentage of the cases, and systems designers
have to figure out where to draw the line beween trying to write code (either procedural
or declarative queries/schems) to handle the weird cases and simply punting to
a human. (Ahem, the option of "we don't want your money until you enter the data to
our exacting standards" appeals to nerds a LOT more than it appeals to Pointy Haired
So sure, a "contract" specifying the format of data is useful (more useful for design,
negotiation, and debugging purposes than for run-time validation IMHO). Doing as
much as feasible at the syntactic level with regular expressions / schema / etc. makes
a lot of sense in *many* circumstances, so sure, people should be encouraged to
use these features WHEN they solve their problems "out of the box." But in
many (most?) real-world situations there is no XML formalism to to define the
contractual constraints appropriately, and the contract must include natural
language descriptions, references to mathematical concepts ("primeness"),
database relationships ("the customer id must exist in the database, the
customer record it identifies must match the information supplied in the order").
The complaint, basically, is that a vastly disproportionate amount of the W3C's effort
has been spent moving from what would be an "80%" solution (roughly what one can
do with RELAX NG, perhaps) to a "90%" solution (maybe 95% ... let's not quibble ...
it's very significantly under 100%). This relatively small increase in the actual
practical effectiveness of the strongly-typed approach over a more weakly-typed
approach does not justify, in the opinion of many who post here, the immense
amount of complexity it has added to WXS and XQuery, the difficulty that has
caused implementers and end users, not to mention the years added to the time it
takes to get the specs to Recommendation status.
So, few would disagree that "it's in the contract". Lots would disagree that
the amount of effort/complexity added to XML++ to validate the "contract" with
schema-based mechanisms is worth the cost.