OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] limits of the generic

[ Lists Home | Date Index | Thread Index ]

Hi Jonathan,

> I think that the way we handle integers is consistent with W3C XML
> Schema, which was not designed to define *operations* on types. We
> don't change the definition of integers, we simply define operations
> on them.

Hmm... but W3C XML Schema *does* define *some* operations on types: is
equal and is less/greater than (normatively) and how to add durations
to date/times (in a non-normative appendix). It has to in order to
specify the behaviour of the enumeration and min/maxIn/Exclusive
facets. It also defines how values are matched by regular expressions,
which I guess is an operation.

On the integer front, I guess that I'm confused by things like (in the
XQuery/XPath 2.0 WD):

  It is also possible to construct values of various types by using a
  cast expression. For example:

  * cast as hatsize(9) returns an item whose primitive value is the
    integer 9 and whose type is the user-defined type hatsize, derived
    from xs:integer.

since when I see the term "primitive" I think it means the same as it
does in W3C XML Schema.
   
>>In W3C XML Schema, durations are covered by xs:duration; in the
>>current WDs for XQuery and XPath 2.0 you have to use either
>>xf:yearMonthDuration or xf:dayTimeDuration to get anything useful
>>done, even when a general duration would be completely unambiguous.
>
> Here I think we are dealing with a bug in W3C XML Schema - and
> perhaps one could argue that durations are less universal than
> integers, especially with respect to time zones. The most basic data
> types may also be the most useful.

I don't understand the reference to time zones with regard to
durations -- as I understand them, durations don't have anything to do
with time zones; the only things that (may) have time zones are the
date/time types.

But I think you're right about how the most basic data types are the
most useful.

>>> If I take an integer out of a relational database and give it to a
>>> Java program, I would often like the Java program to know that it
>>> is an integer. Not just for one XML vocabulary, because I want to
>>> write tools that can handle more than one XML vocabulary. What's
>>> wrong with that?
>>
>>What's wrong with that is that you are tightly coupling your
>>database with your Java program. You are not only transferring the
>>data, but also dictating how that data should be interpreted. This
>>means that you tie your XML document into a particular use -- you're
>>basically using XML for *procedural* markup. Of course procedural
>>markup can give you many benefits, but it is not the only way of
>>working.
>
> Hmmm....here's an area where we disagree. I think that a key benefit
> of data types is precisely that they are not procedural. They
> capture a small, abstract set of semantics which give the
> information needed for sensible general processing, like comparing
> two items or building an index.

Ahh... thank you for this. It flipped a switch somewhere in my head
and changed how I view data typing in XML to such an extent that I
can't now reconstruct how I was picturing things, or thought you or
anyone else were picturing things, before.

What you're saying is that data typing in XML is a matter of labelling
a value as "xs:integer", not actually dictating how the recipient of
that value must treat the value. The recipient may or may not use the
label in order to work out how to process the value. So, for example,
a Java application could choose to treat a value labelled with
"xs:integer" as a floating point number.

So we can think of the labelling of values with data types as being
very similar to labelling content with elements. A data type
definition describes how to validate a values in the same way as an
element declaration describes how to validate an element's content. In
a physical XML document, content is labelled by elements explicitly;
values are labelled with data types through an annotation process
which coincides with validation. The two sets of labels kind of exist
side-by-side, giving you two different "views" on the data in the
document.

Sometimes a data type library might define how a processor should
perform particular operations over the values that are labelled by the
data types that it defines. Similarly, the definition of a markup
language (e.g. XSL-FO, XSLT) might include a specification of how
content in that markup language should be presented or otherwise
processed. I'd call these procedural or proscriptive data types/markup
languages. Other data type libraries and markup languages might be
much more declarative -- be much more purely "labels" that can be
interpreted in different ways by different processors at different
times.

With elements, the kind of processing that a procedural markup
language might specify is usually to do with display or presentation.
With data types, the kind of processing is more to do with how to
compare values of the data type or perform common calculations with
the data type.

A data type called "colour", for example, could be described simply as
"a colour represented through an RGB string in the format #RRGGBB",
without saying anything about how two colours could be compared, or
whether you can add or subtract other colours from it. That would be a
declarative data type. On the other hand, a specification of the
"colour" type could additionally say "a colour is less than another
colour if the sum of its red, green and blue components is less than
the sum of the other colour's red, green and blue components". This
would be procedural because the data type's specification is telling
an application how to perform a particular operation with a value of
that type.

Does that make sense as a way of looking at data typing in XML?

I'd view the W3C XML Schema Datatype library as being a
prescriptive/procedural set of data types, because they do explicitly
specify how they should be processed. You can treat them just as
labels if you want, of course, and use some generic data type
processing to manipulate them. That would be like treating XSL-FO as
just another XML markup language -- displaying a document written in
XSL-FO as a tree rather than in the layout that the XSL-FO specifies.

Extending the analogy, it seems to me that XQuery/XPath 2.0 is like an
XSL-FO processor, in that it's specifically designed to be able to
operate over a particular set of data types.

I'm wondering how far you could get with a processor that took a more
generic view of data types. Perhaps one where the way in which
operations can be performed over values of particular data types was
described in an external specification, a bit like the way you can
define how to display an XML document using CSS. It would need to have
some basic building-blocks that it knows how to manipulate, such as
"number", "string" and "boolean", in the same way that CSS has some
basic building-blocks that it knows how to display, such as "block",
"inline" and "list-item".

I'm thinking of something where a data type library could provide a
formal specification of how to perform operations on a type, which the
processor would then read and use when performing those operations.
For example:

<dt:datatype name="my:UKDate">
<!-- a date in the format DD/MM/YYYY -->

<dt:components>
  <dt:component name="day" select="substring(., 1, 2)" />
  <dt:component name="month" select="substring(., 4, 2)" />
  <dt:component name="year" select="substring(., 7, 4)" />
</dt:components>

<dt:compare to="d"
  select="if (#year > $d#year) then 1
          else if (#year = $d#year) then
            if (#month > $d#month) then 1
            else if (#month = $d#month) then
              if (#day > $d#day) then 1
              else if (#day = $d#day) then 0
              else -1
            else -1
          else -1" />

...
          
</dt:datatype>

[I've used the notation #component to refer to a component of a data
type. For example #year refers to the 'year' component of the current
value. In a processor that supported generic types, you'd have to have
some generic way of defining structured data types like this. Another
option would be to use '.' as an operator, I don't think that would be
too bad, but it would mean that you'd have to do something like: ". .
year" to refer to the year component of the context item, which is
probably more confusing than that other wonderful
operator-which-is-also-a-location-path, *.]

Of course there's always a trade-off to be made between generic
processors and specific processors (gosh, I've even managed to make
this email relevant to the subject line again!) but if data types are
nothing more than labels, are really declarative, then I think that
generic processing is a real option. Certainly one that would be
interesting to explore.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS