OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] are native XML databases needed?

[ Lists Home | Date Index | Thread Index ]

ok, can't stay out of this any longer....

relational database refers to the storage of relations - n-tuples 
(that's what a relation is). there is nothing inherently fast or slow 
about a relational database.

what is fast or slow is the management systems around them. and sql is a 
classic example of something that is slow because at it's heart it is a 
procedural language - the verbs, like join, often imply large amounts of 
work before any optimisation. reason - lack of semantics. object and 
other so called database designs are really management systems that try 
to use semantics to match how we think of the data and/or improve 

triples are interesting because they imply some form of ultimate 5th 
normal form. each datum stored separately. some sort of semantics is 
implied by the structure of rdf.

the big difference between triples and 5th normal form is the regularity 
of a relational database. alternatively you can think of triples as 5th 
normal form with missing columns as implied null values (something i'm 
looking into at the moment).

i think we could move forward a lot faster by recognising that a) the 
storage and maths of relational databases is one thing b) the semantics 
is another.

using this model, sql is a semantic layer, so is the network database, 
so is object oriented, and so is rdf.

we get very high performance by making this distinction with all data 
stored in easy to access relations and semantic tools to do all the 
things we talk about - retrieve, store, validate, format, publish etc.

then one of the things you can do is make validation constraints that 
are temporal - apply only as required, and apply across the entire 
database, not just the table, relation, document, etc that is being 
looked at.

eg the underage egyption employee could be solved by a table of branch 
offices with minimum employment age as an attribute and reference to 
that table when deciding on the validity of a candidate. or it could be 
used for post employemnt checking that company policy is being followed. 
or it might be applied to data entry, but because circumstances change 
you don't want the contraint applied to existing employees or when 
moving records between tables, or when rebuilding a table.

so after many months now watch ing the discussions on this list closely 
i've concluded, for myself at least, that xml wrt data is a semantic 
layer. i've also realised through my brief study of rdf that we can 
design a new (non-xml) storage mechanism that supports triples as easily 
as it does relations and that  seen in this light there is a unifying 
theory of data storage.

putting this together will i guess be the last big project of my career, 
and it is exciting looking forward to the new applications i can now tackle.


ps thanks for the inspiration.
pps for those who asked, we are still debating internally about 
releasing our data technology as open source.

Hunsberger, Peter wrote:

>Bullard, Claude L (Len) <len.bullard@intergraph.com> asks:
>>Off topic, but since data warehousing comes up from 
>>time to time:  what is the advantage of using 
>>an OLAP design vs a relational design?  Is this 
>>advantage better or worse than a triple design?
>Now you've done it, you've gone and imported a perm thread from the
>database world into xml-dev...
>With the exception of the specialized spatial, null compressed, database
>designs, for the most part, OLAP designs are relational designs just
>highly denormalized. I can't really see a significant relationship to
>triple stores.  Your prototypical warehouse "star" schema puts a single
>large table at the center of a bunch of smaller tables (snowflake
>schemas normalize a bit).  Most of the many to many relationships are
>denormalized. Relationships are hard coded in the center tables and your
>standard relationship traversal goes away (that's the whole point, avoid
>join processing costs at the cost of higher storage utilization).  
>Now you could just plop an entire triple store into a single table but I
>can't see how that approach would work at all, all relationships would
>be via procedural value look up and comparison.  To put it another way,
>triples are all about relationship management as opposed to value
>management which is what a data warehouse schema is for.
>Having said that I'll note that if you go to 5th normal form you end up
>with a sort of inverted star; tiny little tables connected to a bunch of
>larger tables.  This is because you've used a single table (with
>possibly a single column) to normalize out a bunch of relationships.
>This pattern does have something to do with triple stores (since that's
>what we're using it for). Given my statements above I'd guess it has
>something to do with ending up with a single key for relationship
>traversal across multiple dimensions/perspectives and thus being able to
>annotate the relationships. I'd postulate that there are some formal
>properties shared between graphs and 5th normal form databases.
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://www.oasis-open.org/mlmanage/index.php>

fn:Rick  Marshall
tel;cell:+61 411 287 530


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS