[
Lists Home |
Date Index |
Thread Index
]
At 04:20 PM 7/3/2002 -0700, Tim Bray wrote:
>Jonathan Robie wrote:
>
>>Let me try to explain why I think named typing is good. Here's a function:
>>define function get-total( element invoice $i )
>> returns xs:decimal
>>{
>> sum( $i//item/price )
>>}
>>This function assumes that the invoices it takes have been validated as
>>invoice elements according to some schema
>
>Wrong. This function assumes that <price> has some numeric type and can
>meaningfully be summed.
And it can only assume this if the data model instance contains information
that tells me the type of price - this is why type annotations are part of
the XQuery data model. They tell the query processor which schema
definitions were used for validation.
Without this information, I can't safely apply the above function to XML
instances. For instance, the above function would not work correctly if
someone had a variant on the invoice schema that used the name 'sale-price'
instead of 'price', because it would not add such elements to the total. It
also relies on the data type, which must be xs:decimal.
This goes back to the basic notion of validation as a contract between the
producer and consumer of data, extending the concept with datatypes, and
with the notion of type annotations.
>The type has presumably been identified to the xquery engine using XSchema
>vocabulary, presumably as xs:decimal. The presumption that a schema
>validation operation has actually taken place is without evidence - in a
>large proportion of cases the data has probably been generated
>programmatically and flowed straight into a database, no angle-brackets in
>evidence anywhere.
You don't need to do schema validation, all you need to do is create an
instance of the XML Query data model - and this need not be done
physically. For instance, many people are working on XQuery mappings to
relational data, where the only physical realization of the type
information is in the relational data dictionary, and where the actual
processing is often done in SQL, without anything remotely resembling XML
Schema processing ever occuring in any physical sense.
>I'm really feeling uneasy - a lot of people whom I consider to be smart
>seem to be participating consensually in the belief that data types are
>organically tied to the validation process, which to me seems empirically
>just nutty.
What we provide is a typed data model. We define how to map from the PSVI
to instances of the data model, because we have to do this for XML. We
don't define how to do this for relational data, but the ISO SQL/XML
committee is defining the XML Schema equivalents for relational data. Other
mappings will be done for various data sources by various parties.
>>At run-time, you don't want to have to test every function parameter to
>>see if it corresponds to a schema, you simply want to ensure that the
>>validator has said this corresponds to the appropriate definition.
>
>It depends; if the data being queried is actually XML, when you encounter
>the string of characters that ostensibly represent <price> you're going to
>have to convert them to a number to do arithmetic, and if you don't have
>exception handling logic surrounding this process you're just being lazy
>and stupid - so it's not clear that you ever escape the process of
>"validation".
But the information need not be validated during query processing if it has
already been validated and stored in some database - this is important to
get any kind of efficiency. And it need not be validated if it is known to
correspond to some data dictionary or set of class definitions. Named
typing is basically about the notion that some information is already known
by the system to conform to a particular definition.
> If on the other hand this is actually something that is known to be an
> integer and thus stored in a C or Java "int", you couldn't test it
> against a schema anyhow because it's no longer XML. So arguments
> claiming that static typing is good because it bypasses runtime
> validation are basically without merit.
Ah, but one of the big reasons for named typing is the notion of creating
views, particularly of relational data, but also for objects and other
typed data sources.
>And named types do seem awfully convenient, so I'm really not disagreeing
>with Jonathan's main point at all. -Tim
Yes, they really are convenient...
Jonathan
|