OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] are native XML databases needed?

[ Lists Home | Date Index | Thread Index ]

Rick Marshall <rjm@zenucom.com> writes:
> 
> ok, can't stay out of this any longer....
> 
> relational database refers to the storage of relations - n-tuples 
> (that's what a relation is). there is nothing inherently fast or slow 
> about a relational database.
> 
> what is fast or slow is the management systems around them. 
> and sql is a 
> classic example of something that is slow because at it's 
> heart it is a 
> procedural language - the verbs, like join, often imply large 
> amounts of 
> work before any optimisation. reason - lack of semantics. object and 
> other so called database designs are really management 
> systems that try 
> to use semantics to match how we think of the data and/or improve 
> performance.

Makes sense.  Would you also allow that a given relational
implementation can add semantics via. the relationships?  IE; a many to
many which is then augmented with type information on the normalizing
table.  For example, people, phones, types of phones.  The type table
ends up being relatively static and constrains and augments the many to
many relationship.  It seems to me that if ER modeling tools recognized
this formally you'd essentially have the same kind of the augmented
semantics as the object, etc. databases?

> triples are interesting because they imply some form of ultimate 5th 
> normal form. each datum stored separately. some sort of semantics is 
> implied by the structure of rdf.

I'm not sure they need to be stored separately, but you certainly don't
end up with a traditional relational schema.  One thing I think can
happen is that you can normalize on type.  You then augment the
relationships with the same pattern as above, but this time it's
metadata. (The bootstrap issue becomes enormous, one wrong decision
about the metatypes and things will end up organized very strangely, but
it might take a long time for the problems to show up...)

> the big difference between triples and 5th normal form is the 
> regularity 
> of a relational database. alternatively you can think of 
> triples as 5th 
> normal form with missing columns as implied null values 
> (something i'm 
> looking into at the moment).

Not sure I follow you here, can you expand?  With 5th normal the missing
columns always have to show up as implied nulls on the joins so I get
you there, but I don't see how that applies to the triples mapping? 
 
> i think we could move forward a lot faster by recognising that a) the 
> storage and maths of relational databases is one thing b) the 
> semantics 
> is another.
> 
> using this model, sql is a semantic layer, so is the network 
> database, 
> so is object oriented, and so is rdf.
> 
> we get very high performance by making this distinction with all data 
> stored in easy to access relations and semantic tools to do all the 
> things we talk about - retrieve, store, validate, format, publish etc.
 
Makes sense.

> then one of the things you can do is make validation constraints that 
> are temporal - apply only as required, and apply across the entire 
> database, not just the table, relation, document, etc that is being 
> looked at.
> 
> eg the underage egyption employee could be solved by a table 
> of branch 
> offices with minimum employment age as an attribute and reference to 
> that table when deciding on the validity of a candidate. or 
> it could be 
> used for post employemnt checking that company policy is 
> being followed. 
> or it might be applied to data entry, but because 
> circumstances change 
> you don't want the contraint applied to existing employees or when 
> moving records between tables, or when rebuilding a table.
 
Yes, exactly.  This is what I was talking about earlier when I gave the
example of context traversal; as you walk the hierarchy to match up
contexts you map the appropriate validation templates into the context
(along with other fragments such as presentation or styling as needed by
the application).  Thus, if you have a hierarchy that describes "Super
Mega Corp", "Marketing Dept." and "Egyptian Branch office", you may have
cross references (in effect) each with the relevant validation,
presentation, styling (and whatever else) template fragments.

> so after many months now watch ing the discussions on this 
> list closely 
> i've concluded, for myself at least, that xml wrt data is a semantic 
> layer. i've also realised through my brief study of rdf that we can 
> design a new (non-xml) storage mechanism that supports 
> triples as easily 
> as it does relations and that  seen in this light there is a unifying 
> theory of data storage.

Ok....???   Guess I'm still confused about how you see the mappings
work...

> putting this together will i guess be the last big project of 
> my career, 
> and it is exciting looking forward to the new applications i 
> can now tackle.

Sounds like fun, awaiting results eagerly.

> 
> rick
> 
> ps thanks for the inspiration.
> pps for those who asked, we are still debating internally about 
> releasing our data technology as open source.
> 
> Hunsberger, Peter wrote:
> 
> >Bullard, Claude L (Len) <len.bullard@intergraph.com> asks:
> >
> >  
> >
> >>Off topic, but since data warehousing comes up from
> >>time to time:  what is the advantage of using 
> >>an OLAP design vs a relational design?  Is this 
> >>advantage better or worse than a triple design?
> >>
> >>    
> >>
> >
> >Now you've done it, you've gone and imported a perm thread from the 
> >database world into xml-dev...
> >
> >With the exception of the specialized spatial, null compressed, 
> >database designs, for the most part, OLAP designs are relational 
> >designs just highly denormalized. I can't really see a significant 
> >relationship to triple stores.  Your prototypical warehouse "star" 
> >schema puts a single large table at the center of a bunch of smaller 
> >tables (snowflake schemas normalize a bit).  Most of the 
> many to many 
> >relationships are denormalized. Relationships are hard coded in the 
> >center tables and your standard relationship traversal goes away 
> >(that's the whole point, avoid join processing costs at the 
> cost of higher storage utilization).
> >
> >Now you could just plop an entire triple store into a single 
> table but 
> >I can't see how that approach would work at all, all relationships 
> >would be via procedural value look up and comparison.  To put it 
> >another way, triples are all about relationship management 
> as opposed 
> >to value management which is what a data warehouse schema is for.
> >
> >Having said that I'll note that if you go to 5th normal form 
> you end up 
> >with a sort of inverted star; tiny little tables connected 
> to a bunch 
> >of larger tables.  This is because you've used a single table (with 
> >possibly a single column) to normalize out a bunch of relationships. 
> >This pattern does have something to do with triple stores 
> (since that's 
> >what we're using it for). Given my statements above I'd guess it has 
> >something to do with ending up with a single key for relationship 
> >traversal across multiple dimensions/perspectives and thus 
> being able 
> >to annotate the relationships. I'd postulate that there are 
> some formal 
> >properties shared between graphs and 5th normal form databases.
> >
> >
> >
> >-----------------------------------------------------------------
> >The xml-dev list is sponsored by XML.org <http://www.xml.org>, an 
> >initiative of OASIS <http://www.oasis-open.org>
> >
> >The list archives are at http://lists.xml.org/archives/xml-dev/
> >
> >To subscribe or unsubscribe from this list use the subscription
> >manager: <http://www.oasis-open.org/mlmanage/index.php>
> >
> >
> >  
> >
> 
> 





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS