Re: [xml-dev] ID/IDREF is evil

Michael,

> I wouldn't normally recommend using an XML database unless it is natural to think about the data as being a set of documents.

I'm increasingly coming to that position. It's part of the reason I've been quietly slipping into the RDF world the last few years: a significant portion of the data that I work with tends to be heavily linked resource descriptions, though often with a documentish context to them. These structures tend to be partially normalized, making them awkward to work with in a standard SQL database, and because they are enterprise-centric, the necessity for managing a global identity space is fairly critical, so RDF seems a natural fit. Having said that, RDF does not handle narrative structures well, because order is no longer an implicit concept.�

I think the next stage is the emergence of hybrid data systems, ones where you have some degree of control over the normalization process, but can nonetheless make use of n-tuples while, when necessary, managing narrative integrity.�Daniela Florescu's vision with jsoniq goes in the right direction - abstract those portions of XQuery that's bound to the processing model away from XML exclusively, and treat the structures as, well, abstract data structures. The XDM and JSDM are both reasonably well understood at a data algebra level, RDF is a generalized graph that can present itself in any modality but sits on an ntuple index as opposed to the 2-tuple indices typical of most NoSQL architectures.

Of course, I could be wrong - it seems like a logical progression, so no doubt some large company with a vested interest will do their hardest to make sure it doesn't happen. Still, I think the approach is sound.

Kurt Cagle
Invited Expert, XForms Working Group, W3C

Managing Editor, XMLToday.org

kurt.cagle@gmail.com

443-837-8725

On Thu, Feb 20, 2014 at 2:30 AM, Michael Kay <mike@saxonica.com> wrote:

On 19 Feb 2014, at 22:45, Kurt Cagle <kurt.cagle@gmail.com> wrote:

> Actually, this brings up something I've been thinking about for a while. It is typical to think of an XML document as being a self-contained entity

I think this is one of the big problems with the use of XML as a database model.

Sometimes the concept of a "document" makes sense, it relates to something in the real world, like an insurance claim. Sometimes it makes no sense at all, e.g. when you're modelling the human genome. Ideally, the choice of document boundaries shouldn't make much difference; queries should work the same way regardless of where the document boundaries are. In practice, that's very hard to achieve, and one of the reasons is that intra-document linking is so very different from cross-document linking. There are actually three basic ways of modelling relationship in XML: use of the XML containment hierarchy, use of intra-document links, and use of cross-document links; and the way you write queries is totally dependent on which representation has been chosen. That violates the basic principles of data independence (which was the topic of my PhD thesis in 1975...)

I wouldn't normally recommend using an XML database unless it is natural to think about the data as being a set of documents.

Michael Kay
Saxonica