xml-dev - Re: [xml-dev] are native XML databases needed?

Re: [xml-dev] are native XML databases needed?

[ Lists Home | Date Index | Thread Index ]

To: Rick Marshall <rjm@zenucom.com>
Subject: Re: [xml-dev] are native XML databases needed?
From: w3c@drrw.info
Date: Fri, 27 Aug 2004 08:32:53 -0400
Cc: "Hunsberger, Peter" <Peter.Hunsberger@STJUDE.ORG>, "Bullard, Claude L \(Len\)" <len.bullard@intergraph.com>, Ken North <kennorth@sbcglobal.net>, xml-dev@lists.xml.org
In-reply-to: <412E62CA.7080804@zenucom.com>
References: <1E0CC447E59C974CA5C7160D2A2854EC097E8C@SJMEMXMB04.stjude.sjcrh.local> <412E62CA.7080804@zenucom.com>
User-agent: Internet Messaging Program (IMP) 3.2.2 / FreeBSD-4.8

Rick,

WRT triples, RDF, and queries - I recommend you look at SWI Prolog and the
predicate libraries they have available - all open source.

Enjoy, DW

Quoting Rick Marshall <rjm@zenucom.com>:

> ok, can't stay out of this any longer....
> 
> relational database refers to the storage of relations - n-tuples 
> (that's what a relation is). there is nothing inherently fast or slow 
> about a relational database.
> 
> what is fast or slow is the management systems around them. and sql is a 
> classic example of something that is slow because at it's heart it is a 
> procedural language - the verbs, like join, often imply large amounts of 
> work before any optimisation. reason - lack of semantics. object and 
> other so called database designs are really management systems that try 
> to use semantics to match how we think of the data and/or improve 
> performance.
> 
> triples are interesting because they imply some form of ultimate 5th 
> normal form. each datum stored separately. some sort of semantics is 
> implied by the structure of rdf.
> 
> the big difference between triples and 5th normal form is the regularity 
> of a relational database. alternatively you can think of triples as 5th 
> normal form with missing columns as implied null values (something i'm 
> looking into at the moment).
> 
> i think we could move forward a lot faster by recognising that a) the 
> storage and maths of relational databases is one thing b) the semantics 
> is another.
> 
> using this model, sql is a semantic layer, so is the network database, 
> so is object oriented, and so is rdf.
> 
> we get very high performance by making this distinction with all data 
> stored in easy to access relations and semantic tools to do all the 
> things we talk about - retrieve, store, validate, format, publish etc.
> 
> then one of the things you can do is make validation constraints that 
> are temporal - apply only as required, and apply across the entire 
> database, not just the table, relation, document, etc that is being 
> looked at.
> 
> eg the underage egyption employee could be solved by a table of branch 
> offices with minimum employment age as an attribute and reference to 
> that table when deciding on the validity of a candidate. or it could be 
> used for post employemnt checking that company policy is being followed. 
> or it might be applied to data entry, but because circumstances change 
> you don't want the contraint applied to existing employees or when 
> moving records between tables, or when rebuilding a table.
> 
> so after many months now watch ing the discussions on this list closely 
> i've concluded, for myself at least, that xml wrt data is a semantic 
> layer. i've also realised through my brief study of rdf that we can 
> design a new (non-xml) storage mechanism that supports triples as easily 
> as it does relations and that  seen in this light there is a unifying 
> theory of data storage.
> 
> putting this together will i guess be the last big project of my career, 
> and it is exciting looking forward to the new applications i can now tackle.
> 
> rick
> 
> ps thanks for the inspiration.
> pps for those who asked, we are still debating internally about 
> releasing our data technology as open source.
> 
> Hunsberger, Peter wrote:
> 
> >Bullard, Claude L (Len) <len.bullard@intergraph.com> asks:
> >
> >  
> >
> >>Off topic, but since data warehousing comes up from 
> >>time to time:  what is the advantage of using 
> >>an OLAP design vs a relational design?  Is this 
> >>advantage better or worse than a triple design?
> >>
> >>    
> >>
> >
> >Now you've done it, you've gone and imported a perm thread from the
> >database world into xml-dev...
> >
> >With the exception of the specialized spatial, null compressed, database
> >designs, for the most part, OLAP designs are relational designs just
> >highly denormalized. I can't really see a significant relationship to
> >triple stores.  Your prototypical warehouse "star" schema puts a single
> >large table at the center of a bunch of smaller tables (snowflake
> >schemas normalize a bit).  Most of the many to many relationships are
> >denormalized. Relationships are hard coded in the center tables and your
> >standard relationship traversal goes away (that's the whole point, avoid
> >join processing costs at the cost of higher storage utilization).  
> >
> >Now you could just plop an entire triple store into a single table but I
> >can't see how that approach would work at all, all relationships would
> >be via procedural value look up and comparison.  To put it another way,
> >triples are all about relationship management as opposed to value
> >management which is what a data warehouse schema is for.
> >
> >Having said that I'll note that if you go to 5th normal form you end up
> >with a sort of inverted star; tiny little tables connected to a bunch of
> >larger tables.  This is because you've used a single table (with
> >possibly a single column) to normalize out a bunch of relationships.
> >This pattern does have something to do with triple stores (since that's
> >what we're using it for). Given my statements above I'd guess it has
> >something to do with ending up with a single key for relationship
> >traversal across multiple dimensions/perspectives and thus being able to
> >annotate the relationships. I'd postulate that there are some formal
> >properties shared between graphs and 5th normal form databases.
> >
> >
> >
> >-----------------------------------------------------------------
> >The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> >initiative of OASIS <http://www.oasis-open.org>
> >
> >The list archives are at http://lists.xml.org/archives/xml-dev/
> >
> >To subscribe or unsubscribe from this list use the subscription
> >manager: <http://www.oasis-open.org/mlmanage/index.php>
> >
> >
> >  
> >
> 
>

References:
- RE: [xml-dev] are native XML databases needed?
  - From: "Hunsberger, Peter" <Peter.Hunsberger@STJUDE.ORG>
- Re: [xml-dev] are native XML databases needed?
  - From: Rick Marshall <rjm@zenucom.com>

Prev by Date: Re: [xml-dev] XML Schema views - are they possible?
Next by Date: RE: [xml-dev] Fallacies of Validation, version #2
Previous by thread: Re: [xml-dev] are native XML databases needed?
Next by thread: RE: [xml-dev] are native XML databases needed?
Index(es):
- Date
- Thread