Lists Home |
Date Index |
ok, can't stay out of this any longer....
relational database refers to the storage of relations - n-tuples
(that's what a relation is). there is nothing inherently fast or slow
about a relational database.
what is fast or slow is the management systems around them. and sql is a
classic example of something that is slow because at it's heart it is a
procedural language - the verbs, like join, often imply large amounts of
work before any optimisation. reason - lack of semantics. object and
other so called database designs are really management systems that try
to use semantics to match how we think of the data and/or improve
triples are interesting because they imply some form of ultimate 5th
normal form. each datum stored separately. some sort of semantics is
implied by the structure of rdf.
the big difference between triples and 5th normal form is the regularity
of a relational database. alternatively you can think of triples as 5th
normal form with missing columns as implied null values (something i'm
looking into at the moment).
i think we could move forward a lot faster by recognising that a) the
storage and maths of relational databases is one thing b) the semantics
using this model, sql is a semantic layer, so is the network database,
so is object oriented, and so is rdf.
we get very high performance by making this distinction with all data
stored in easy to access relations and semantic tools to do all the
things we talk about - retrieve, store, validate, format, publish etc.
then one of the things you can do is make validation constraints that
are temporal - apply only as required, and apply across the entire
database, not just the table, relation, document, etc that is being
eg the underage egyption employee could be solved by a table of branch
offices with minimum employment age as an attribute and reference to
that table when deciding on the validity of a candidate. or it could be
used for post employemnt checking that company policy is being followed.
or it might be applied to data entry, but because circumstances change
you don't want the contraint applied to existing employees or when
moving records between tables, or when rebuilding a table.
so after many months now watch ing the discussions on this list closely
i've concluded, for myself at least, that xml wrt data is a semantic
layer. i've also realised through my brief study of rdf that we can
design a new (non-xml) storage mechanism that supports triples as easily
as it does relations and that seen in this light there is a unifying
theory of data storage.
putting this together will i guess be the last big project of my career,
and it is exciting looking forward to the new applications i can now tackle.
ps thanks for the inspiration.
pps for those who asked, we are still debating internally about
releasing our data technology as open source.
Hunsberger, Peter wrote:
>Bullard, Claude L (Len) <email@example.com> asks:
>>Off topic, but since data warehousing comes up from
>>time to time: what is the advantage of using
>>an OLAP design vs a relational design? Is this
>>advantage better or worse than a triple design?
>Now you've done it, you've gone and imported a perm thread from the
>database world into xml-dev...
>With the exception of the specialized spatial, null compressed, database
>designs, for the most part, OLAP designs are relational designs just
>highly denormalized. I can't really see a significant relationship to
>triple stores. Your prototypical warehouse "star" schema puts a single
>large table at the center of a bunch of smaller tables (snowflake
>schemas normalize a bit). Most of the many to many relationships are
>denormalized. Relationships are hard coded in the center tables and your
>standard relationship traversal goes away (that's the whole point, avoid
>join processing costs at the cost of higher storage utilization).
>Now you could just plop an entire triple store into a single table but I
>can't see how that approach would work at all, all relationships would
>be via procedural value look up and comparison. To put it another way,
>triples are all about relationship management as opposed to value
>management which is what a data warehouse schema is for.
>Having said that I'll note that if you go to 5th normal form you end up
>with a sort of inverted star; tiny little tables connected to a bunch of
>larger tables. This is because you've used a single table (with
>possibly a single column) to normalize out a bunch of relationships.
>This pattern does have something to do with triple stores (since that's
>what we're using it for). Given my statements above I'd guess it has
>something to do with ending up with a single key for relationship
>traversal across multiple dimensions/perspectives and thus being able to
>annotate the relationships. I'd postulate that there are some formal
>properties shared between graphs and 5th normal form databases.
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>To subscribe or unsubscribe from this list use the subscription
tel;cell:+61 411 287 530