[
Lists Home |
Date Index |
Thread Index
]
Michael Champion wrote:
>
>
> I just wonder whether RDF advocates are aware of these other
> triples-like approaches, what they think they've learned from them, or
> whether they think the analogy is misleading?
>
>
I can't speak for all RDF "advocates", but some us that have been
working with RDF are aware of the various binary-relational models (I'm
basically a database guy, and these models have a pretty substantial
database literature). NB: Strictly speaking, we're talking about
binary-relational models with surrogates as keys/identifiers (URIs in
the case of RDF). I think Chris Date's response in the reference you
cite describes the gist of the situation, although his reponse restricts
itself to various alternative *relational* approaches. Somewhat
generalized, his points might be described as:
1. You need to have data structures that are convenient to manipulate
(for the manipulation tasks you are doing) and present to users. These
need not be restricted to n-ary relations; they also include XML
documents that reflect the real structures of data that people are used
to dealing with, programming language objects, and so on.
2. You want to be able to boil those structures down into highly
normalized "information atoms", e.g., to eliminate redundancies of
various kinds, or to try to capture the essential information that may
be held in common by two data structures that differ only in the fact
that they've chosen different "higher level" structuring techniques for
the purposes of #1 (e.g., two different n-ary relational schemas, or an
n-ary relational schema vs. a highly nested XML structure).
In Date's version of point #1, he notes the need for n-ary relations to
represent the results of join operations. This can be generalized as
the need to support aggregating data from the various structures you
start off with. In the case of RDF, this means that you have to be
prepared to deal not just with triples as individuals, but with whole
graphs (e.g., to deal with all the information you have about some
person or other entity).
The "irreducible" relations Chris mentions in his point #2 would, in RDF
(and in the earlier binary-relational models), need to be broken down
into binaries by introducing additional URIs (or blank nodes) to serve
as identiifers for the additional entities you'd need to talk about, and
by introducing additional predicates (relation names). Some of these
can seem pretty artificial from an intuitive point of view when you look
at the data, but no data model seems totally "natural" in every
application. And this, I think, is a key point: what is the
application of the data model? Date, for example, notes the distinction
between the use of n-ary relations in general, and the desirability of
more highly-normalized relations (often binary, but at least
irreducable) as base relations (together, of course, with the ability to
go between one form and the other). Similar considerations, it seems to
me, apply to RDF. In cases, say, where a number of apps need to
communicate, and they "understand" a mutually agreed-on XML vocabulary
and collection of XML structures, there's no obvious need for RDF or
anything like it. But what if we need to aggregate those communications
with those of several other communities with their own separate and
distinct XML vocabularies and structures? In that case, it seems to me
I'd like to be able to boil down those communications into things like
how many entities are being referred to, what is being said about each
one, and so on. I think RDF helps with that as a common way of modeling
(thinking about) those kinds of "information atoms". This doesn't
require that the original data be in RDF, but does require that it be
convertable to an RDF-like form (making sure that conversion can be done
may require that some additional information be provided in the original
XML about which things identify entities, and which things are
properties; this has been touched on already in earlier discussion of
"striping").
The usefulness of being able to boil data from heterogeneous data
structures / schemas down into a more normalized binary-like form in
order to do aggregation has been explored fairly well in the literature
on heterogeneous database integration (using "heterogeneous" in two
senses: integrating data from different schemas in the *same* data
model, and integrating data from different schemas in *different* data
models, such as relational and hierarchical databases). There's also
currently a lot of research going on in the database community under the
name "model management" to this end. All of the model management work
I'm aware of uses some form of binary (graph) model.
--Frank
|