Re: [xml-dev] XML Schema as a data modeling tool

I've sort of implied a difference between "Enterprise" models and what I'll call instance models. �I'd say a model is Enterprise ready if it applies to all instances of a given set of data domains at all the points in time they are used within a set of applications (Enterprise would be better but is not always achievable). �I'll qualify this by saying that even across applications such a thing rarely exists, the drive towards having a Master Data Model (MDM) can help, but doesn't necessarily result in such a beast.

However, that distinction aside, there is also a set of mathematical / analytical rigour that can be applied to some models and not to others. �As I said previously, graphs have a entire field devoted to their study and there are off the shelf theorems that can be used to prove the applicability of the models and algorithms used with a given graph. �Hierarchies are a limited subset of graphs and provide much less coverage across these capabilities. �

So this might lead to the question whether one can get an Enterprise ready graph model? � I'd say yes, in the last couple years it has become possible to create collections of data and metadata that really will meet such requirements and I will once again point to the graph databases such as Neo4J and Titan as the entry points for the underlying implementations. �However, the traditional data modeller doesn't have to make the leap into this strange new tool set to continue to provide value. �In particular, a good ER diagram is in fact a graph. �Moreover, good ER modelling tools let you abstract details of this graph for presentation purposes so that you can build high level conceptual models that even the C level suite can understand in the 10 minutes of time you for their presentation. �The models can subsequently drill down to multiple logical and multiple physical models that can be handed to the implementation team and the physical models can map directly to physical �schema whether that is XSD or DDL.�

One final qualification; third normal form and even higher levels of normalization are still good practices when doing this modelling. �A lot of the really interesting entities don't emerge until you've tried to do things like resolve redundant relationships. �It is these intersections of the more concrete entities that will often be business domain specific and, in fact, by materializing them you may be casting portions of your business rules or practices into metadata. �The interesting thing these days is that you can leave these as pure metadata. �With a graph database you don't need to have physical intersection tables for your normalized data and you get to skip the resultant join overhead when you start to utilize the data. �To make this explicit, what were once normalized entities are now the edges of a graph. �Even better, in some implementations these can carry properties with them so things like the relationship: [Person] --- has --> [Device] might carry with it extra properties that tell us whether that "has" relationship is exclusive or shared or it might have the date that the relationship came into being. (Just to be clear, the vertices also have properties.) � Once you've got all this assembled queries like "find all the iPhones �issued to employees this year for their personal use that were reported lost or stolen" require the traversal of no more entities than those given here. One final qualification to my final qualification; �the graph databases can handle "big data" volumes, but that doesn't necessarily mean they are your sole data storage. �In case I haven't made it clear, for the things I like to do I still see their sweet spot as metadata management (so my example should be metadata about People and Devices and the edges carry metadata about this metadata), but the metadata we end up with can have scope that encompasses petabytes of other data...

Peter Hunsberger

On Wed, Oct 2, 2013 at 12:54 AM, Hans-Juergen Rennau <hrennau@yahoo.de> wrote:

Beginning to review this thread, it dawns on me how important it is to conceptualize the kind of models we are thinking of - there are important differences and a good part of the controversy may be caused by those differences.

Also for this reason, I would like to better understand the following remark made by Michael Kay:

"Are we talking about a shopping cart with wheels, or one that exists only in an online shopping application? If we're talking about the latter, then we're not talking about modelling the real world, we are talking about designing an electronic virtual world. The two tasks have similarities, but they are not at all the same."

Michael, could you clarify? I am not sure about that difference. At least what concerns the finished model, I cannot see any principal difference between the structures describing, say, a shopping cart, a license, an accommodation, a program, a company, a protein. It is (usually) named items representing concepts, which may in either case have existed before model construction, or are defined in the course of model construction. So perhaps you refer to the design process, rather than to the result, to the effort of "capturing something out there"? A hint would be appreciated.

Hans-Juergen