[
Lists Home |
Date Index |
Thread Index
]
"Simon St.Laurent" wrote:
> > I disagree. I think they have thought through what they want to do.
>
> I think you must hang out with a wiser crowd than I do.
Hmmm. Let's see. There's me, and my 1 year old daughter, and a whole lot
of stuffed animals in varying states of chewed-on-ness, ... ;)
> > They
> > want, for example, to be able to validate that a quantity is a (string
> > representing an) integer and not "abc". This strikes me as perfectly
> > reasonable.
>
> For validation only, fine. I don't mind checking that something's an
> integer. Where I think they lack imagination is their insistence that
> because it _is_ an integer in XYZ language, it _is_ an integer in
> markup. Then the markup technologies have to deal with such notions,
> and suddenly things look very different.
But I'm not looking at something as an integer in XYZ language. That's
what leads to data types like int, short, and long. I'm looking at
something as an integer in the real world. When I count my fingers, I
get an integer. When I measure the length of my toes, I get a float.
When I shuffle through the coins in my pocket, I get a two-place
decimal. This has nothing to do with programming languages -- this is
what the data actually is and it seems entirely appropriate to markup.
One of the ugly realities that has come out of data integration research
is that there are a whole lot of assumptions about data that can't be
determined by a machine looking at an element type or attribute name.
For example, if I send you an invoice that includes the markup:
<Total>1234</Total>
how much money in real (US) dollars are you going to send me? How to
interpret the value 1234 needs to be somewhere. You can put it
explicitly in markup, you can put it in a schema, or you can put it in a
specification that may or may not be visible to the recipient of the
document. The first two are machine-processable; the third is not. Data
types are useful.
> Think about retrofitting the DOM or especially SAX to represent the
> PSVI, and that difference is pretty plain.
I'm pretty convinced that the current PSVI is not the way to go. It
confuses defaults with types with validation, depends on XML Schemas for
its typing model, and, once you force people to support the XML Schema
data model, is too onerous on implementers.
I'm less convinced that some sort of typing information -- especially
simple types -- isn't warranted. One thing I'm not clear on is whether
that typing information belongs in a schema or in an augmented instance
document. I like the idea of metadata-driven applications, but all the
ones I can think of run off schemas rather than type information in
instance documents.
(For example, I've never written a query processor, but it seems
reasonable to keep type information separate from instance data when
compiling and optimizing a query. Or is the problem that, in a
semi-structured world, this isn't possible?)
That's one reason that I asked how people intended to use the PSVI. I
find it interesting that the answer has, so far, been silence.
> > I think where the XML Schemas data types have missed the boat is in
> > their degree of complexity, as Amy Lewis points out. If you want
> > interoperability, you can't go much beyond saying that a number is an
> > integer. It is the up to the recipient of the document to decide (a)
> > whether they can represent that number in whatever language / system
> > they are using and (b) what data type they choose to represent it in.
>
> "It is up to the recipient" is fine by me - but that's not what I hear
> discussed in ~90% of discussions of XML typing and schemas outside of
> this list.
Agreed, although I'm not convinced that means we should throw out types
altogether.
-- Ron
|