Re: [xml-dev] ID/IDREF is evil

Interesting thoughts Kurt, I seem to follow you and Dan McCreary around the countryside (http://semanticweb.com/native-xml-databases-and-rdf_b20575) "Have you seen them? Yes - they were here about two years ago".

Recently I have been looking at older work on 'Semantic' or 'Conceptual' Data Modeling, it seems to have been an evolution of the relational approach, essentially to try to put more smarts into the database. It actually seems to have started with Codd back in around 1970 who proposed an new approach (RM/T (Relational Model/Tasmania)) but subsequently taken up mostly by others.

I can mention the Xplain data language and DBMS of Johan ter Bekke, which seems like a forgotten thing now (and I'm trying to understand why).

But more recently the Object Role Modeling (ORM (somewhat unfortunate)) of Terry Halpin seems very the modern counterpart of this conceptual approach, in the database context at least.

Halpin now certainly seems to be going in the hybrid database direction you are suggesting with a new database system called 'LogicBlox'. I am just starting to read his Information Modeling and Relational Databases. Second Edition (which does seem largely about ORM and it merits), but that is pre-LogicBlox.

I don't really know much about the RDF world, but am starting to learn, at this point it strikes me as being very powerful in terms of taking data from anywhere and making use of it (the vision of TBL), but that from an 'ordinary' user perspective the tools are not that friendly, which is not so surprising, its essentially trying to bring AI to the masses (this paper discusses the problem: http://swui.semanticweb.org/swui06/papers/Karger/Pathetic_Fallacy.html).

My key point is this: that from a cost-benefit perspective XML databases seem to have a lot of merit, if the data starts out in a hierarchical format and can be managed and used for its purposes (by creating information using XQuery) then there is little merit in normalising it. In fact, from a data user perspective, one who is not that familiar with the eventual relational model design, but who understands how the data was collected (essentially has a conceptual model), I think it adds significantly to the cost.

It seems that since the relational approach and SQL were invented, Moores Law, and the web, have totally flipped the economic equation. We can now use computing power to make sense of messy denormalised data quite well. To me using conceptual models as the interface to such data now seems like a good thing. So maybe its not hybrid databases but smarter clients that are needed? I have liked the 'info-space' ideas that Hans-Juergen Rennau has put forward in this regard, a kind of navigational approach to data in my view.

I hope some of this is of interest.

Thanks
Steve Cameron

On Fri, Feb 21, 2014 at 3:22 AM, Kurt Cagle <kurt.cagle@gmail.com> wrote:

Michael,

> I wouldn't normally recommend using an XML database unless it is natural to think about the data as being a set of documents.

I'm increasingly coming to that position. It's part of the reason I've been quietly slipping into the RDF world the last few years: a significant portion of the data that I work with tends to be heavily linked resource descriptions, though often with a documentish context to them. These structures tend to be partially normalized, making them awkward to work with in a standard SQL database, and because they are enterprise-centric, the necessity for managing a global identity space is fairly critical, so RDF seems a natural fit. Having said that, RDF does not handle narrative structures well, because order is no longer an implicit concept.�

I think the next stage is the emergence of hybrid data systems, ones where you have some degree of control over the normalization process, but can nonetheless make use of n-tuples while, when necessary, managing narrative integrity.�Daniela Florescu's vision with jsoniq goes in the right direction - abstract those portions of XQuery that's bound to the processing model away from XML exclusively, and treat the structures as, well, abstract data structures. The XDM and JSDM are both reasonably well understood at a data algebra level, RDF is a generalized graph that can present itself in any modality but sits on an ntuple index as opposed to the 2-tuple indices typical of most NoSQL architectures.

Of course, I could be wrong - it seems like a logical progression, so no doubt some large company with a vested interest will do their hardest to make sure it doesn't happen. Still, I think the approach is sound.

Kurt Cagle
Invited Expert, XForms Working Group, W3C
Managing Editor, XMLToday.org
kurt.cagle@gmail.com

443-837-8725

On Thu, Feb 20, 2014 at 2:30 AM, Michael Kay <mike@saxonica.com> wrote:

On 19 Feb 2014, at 22:45, Kurt Cagle <kurt.cagle@gmail.com> wrote:

> Actually, this brings up something I've been thinking about for a while. It is typical to think of an XML document as being a self-contained entity

I think this is one of the big problems with the use of XML as a database model.

Sometimes the concept of a "document" makes sense, it relates to something in the real world, like an insurance claim. Sometimes it makes no sense at all, e.g. when you're modelling the human genome. Ideally, the choice of document boundaries shouldn't make much difference; queries should work the same way regardless of where the document boundaries are. In practice, that's very hard to achieve, and one of the reasons is that intra-document linking is so very different from cross-document linking. There are actually three basic ways of modelling relationship in XML: use of the XML containment hierarchy, use of intra-document links, and use of cross-document links; and the way you write queries is totally dependent on which representation has been chosen. That violates the basic principles of data independence (which was the topic of my PhD thesis in 1975...)

I wouldn't normally recommend using an XML database unless it is natural to think about the data as being a set of documents.

Michael Kay
Saxonica