OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Thinking About Data (was RE: Enlightenment via avoiding the T-wor d)

> -----Original Message-----
> From: Tim Bray [mailto:tbray@textuality.com]
> Sent: Wednesday, August 29, 2001 2:19 PM
> To: xml-dev@lists.xml.org
> Subject: Re: Enlightenment via avoiding the T-word

> There's scope for a nice general essay here about the 
> differences between ways of thinking about data; basic WF XML, 
> OOP, and RDBMS represent instructively different thought 
> patterns.  PSVI and DTDs and SOAP and so on fit into this 
> pattern in interesting ways.  -Tim

That sounds like an interesting topic ... I think it's important to sort out
the various "theories of data" because a) we collectively want to steal the
best ideas of the RDMBS and OO people, but haven't managed to do this in a
coherent way yet; and b) in the real world, we have to make RDBMS, OO, and
XML applications and databases interoperate, and we don't have really great
ways to do this yet.

I don't have well-formed opinions here (much less valid ones!) .  C.J. Date
has written on  the RM and OO aspects of this subject in his "Third
Manifesto" book. I haven't read it, just the article at
http://www.dbpd.com/vault/9808date.html The article is called "Back to the
Relational Future", so I guess you know where he comes out, but it does
provide us a nice starting point.

Here's a strawman idea on how XML, OOP, and RDBMS "think about data" ... and
some thoughts about how the PSVI may fit in.

Relational Model - Quoting from the Date (obviously an authoritative
source!) article above (but I know he expounds on this elsewhere): "So a
database is, abstractly, just a collection of true propositions. And
relational theory supports this view of databases very directly, because
tuples in relations (rows in tables, if you prefer) are directly
interpretable as such true propositions."
So, the relational model uses relations and tuples as the operands, the
operations defined in the relational algebra as the operators, and
propositions about the values of tuples in relations as the fundamental unit
of analysis. Critical to the RM is the notion of "integrity", i.e., ensuring
that the discrete propositions in a database provide an internally
consistent set of "theorems" about the world the databse describes. RM
purists don't agonize over types, type inheritance, type operators ...
that's all abstracted away in the concept of the "domains" fom which the
tuples take their values.

OO paradigm -- I'm not sure who, if anyone, is the authoritative source, or
if there *is* anything that could be called an "OO model" at the same level
of specificicity as the RM. (Date loves to rant about this, pick nits with
OO gurus such as Soustroup and the inventors of UML, and generally promote
the pure RM as cleanly  doing everything that the OO paradigm tries to do in
a muddled way ...).  My best guess is that the abstractions "class" and
"object" are the operands, and there would be a few generic "operators"
(such as construction, inheritance, and property accessors) and a whole lot
of class-specific operators if someone did formalize an OO model.  I've been
wondering if OO purists (if there is such a thing!) *do* think about data;
one might think that the whole point of the OO paradigm is to present the
object as an abstract operand with well-defined operators, and whatever data
the instantiation of the operators operate on is encapsulated away from
view.  Or, perhaps the OO paradigm encourages us to think about data ONLY in
terms of "types" (types == classes???), class hierarchices, and the
operations on classes.  

Well-Formed XML -- Obviously this is "just" a syntax and nobody has defined
anything like an authoritative theory of how XML "thinks about data" ... but
there are a lot of free-floating ideas out there.  Perhaps it is nothing
more than a neutral serialization format for RM tuples, Objects, and
unstructured text, and it has no way of "thinking about data" except as
syntax.  Those of us who work with XML databases, however, have to come up
with *some* conception of how XML relates to the RM... and since my "day
job" involves a lot of explaining of when we think RDBMSs are mose
appropriate and when XML databases are most appropriate for different system
requirements, I've given this a fair amount of thought. The way I see it, WF
XML "thinks of data" not as discrete propositions about some world that must
be kept internally consistent, but as "bundles" of inter-related
propositions that describe a snapshot of the world. Thus, the XML data model
is inherently at a higher level of granularity than the RM.  The
inter-relationships are hierarchical (remember, this is WF XML, no ID/IDREF
relationships defined), meaning that they are much less flexible than those
allowed by the RM, but since they're "hard-coded" in element/attribute
hierarchies, we don't have to worrry about referential integrity constraints
-- any thing well-formed is internally consistent even if it's inconsistent
at some higher semantic level. Thus, the basic operands of some
formalization of WF XML would be trees of some sort, not tuples.  (Of course
one *could* look at the indivual components of an XML document as discrete
propositions that can flexibly inter-relate rather than being fixed in a
hierarchy ... but the RM already defines how to do that, so there would be
no value in formalizing an XML flavored version).  XML "thinks about data"
differently from the RM in other ways, notably by specifying that the
sequence and embedding of components matters.  It thinks about data almost
completely differently than the OO paradigm, because there is no conception
of type nor inheritance, nor any operators other than maybe graph-theoretic
structure navigation and manipulation operations.  (Interestingly, the
relational model "made its bones" off the CODASYL model 25 years ago by
showing that these structure navigation operations were unnecessary. Date's
THE DATABASE RELATIONAL MODEL has a very clear discussion of this historical
episode that us XML weenies really need to come to grips with somehow).

PSVI XML -- I think it's clear that this is WF XML on OOP steroids (or
hallucinogens, if you prefer). I'd guess that C. J. Date and the other
relational purists would (if it ever gets on their radar) think of it as the
worst of the fuzzy OO world plus the worst of the hierarchical XML world.
Perhaps the OO people will think more kindly of it as OO that pays some
attention to data serialization and interchange up-front rather than
relegating it to CORBA to worry about.  Insisting on schemas and types
certainly makes XML more OOP-friendly than WF XML is-- e.g., we can use
databinding tools to generate classes for handing data and we can access XML
elements as instances of a Java/C++ class rather than as a mess of character
data.  On the other hand, it appears from what we've seen on this list that
you have to buy into the Schema/PSVI/OO types paradigm big time or not at
all.  Trying to mix the WF XML view and the PSVI (as Sean McGrath noted a
few posts back) creates "brain puree" and you start to question your own
sanity :~)   I'd been resisting Simon's idea that maybe it's time for the WF
people and the PSVI people to go their separate ways, but writing up this
e-mail has gotten me thinking that while we can share foundations and tools
across the WF/PSVI divide, the two camps seem to have a fundamentally
different way of thinking about data.

Anyway, again all this gibberish is just a strawman proposal to try to get
more people to think about the underlying "theories of data" and share their
brainstorms, headaches, delusions, and hallucinations. Tim Bray, you brought
this up ... what might YOU say in a "nice general essay here about the
differences between ways of thinking about data"? ... feeling free to smack
down my strawmen; that's what I set them up for!