Re: [xml-dev] XML Schema as a data modeling tool

John, �

if I guess correctly you are talking about large grained artifacts as your models, things like ER diagrams and schemas that go on for a couple of pages? �I'm talking about fine grained artifacts defined completely in metadata. �If you're lucky, the former might live in a repository that can be queried, the latter has to live in a database, it is completely dynamic (though it can be versioned) and a ER diagram produced from it only documents a subset at a particular instance in time. �When the model I describe is complete the ontology is part of the same model and in fact describes the metadata for the metadata (among other more conventional artifacts). �I am talking about a pretty radical change in the way businesses do modeling, I don't think it would be possible for a large enterprise to do this as recently as two years ago, the technology was there but it would have been too much of a risk to take on at the scale I describe. �However, some of the startup world has been doing this for years now and my own version of this first took form to handle clinical research data 10 years ago.�

I've talked about that particular metadata driven system a bit here on the list in the past. That system dynamically assembles the metadata (using XSLT in it's case, but good graph manipulation is still hard to come by) to determine how to manage the research data on the fly. �If you need an XML schema you can export it from the metadata. �The system we built was built on relational technologies to store the metadata and the data and doesn't scale well as a result. �Now-a-days graph databases can be used to manage the relationships of data stored in things like Casandra, hBase, Lucerne and ElasticSearch completely on the fly with horizontal replication in the cloud on the fly. �It's a different universe than it was 10 years ago; the "insane" has been done and it works. �However, I have worked at a phone company with 50k employees and 2k devs in the distant past. �Even then they were using a database for the model repository, but yes it would be slow and difficult process to drive this into such an organization and there are parts of that universe where you would likely not want such a system, at least not yet.

I guess I should mention to this list that I am looking for new employment, my current employer has closed their US development shop (and did not even come close to grasping what I describe although they desperately needed it) . �So if anyone would like to talk to me about doing any of this as a full time gig or have me talk to them as a consultant �please feel free to contact me off list!

Peter Hunsberger

On Thu, Oct 3, 2013 at 12:21 PM, John Cowan <johnwcowan@gmail.com> wrote:

On Thu, Oct 3, 2013 at 12:57 PM, Peter Hunsberger <peter.hunsberger@gmail.com> wrote:

Well yes, that was hyperbole, in the e-mail prior to this I pointed out you'll never get to that point. �However, I �think you miss the point; this can be domain specific. �If you're a phone company then it is quite likely that your Enterprise data model pretty much encapsulates everything you need to know about phones in order to do business with the phone company and build applications dealing with phone related data....�

I work on data models and document models for $EMPLOYER. �There are now about 300 data models and about 150-200 schemas (highly modularized; I'm not sure how many modules there are, but more than 500) that are "enterprise" in the sense that they are not specific to individual business units. �I figure the job is about half done, though it may never be finished completely (eventually the cost of modeling exceeds the benefits: we don't and never will have a model of every asteroid in the solar system, even though the underlying physics is fairly simple) �Though they have common factors and inter-model links aplenty, merging them into a single data model would be insane. �(We do have a single ontology with about 1000 classes and maybe 300 properties, but it does not go down to the same detail as the schemas do by any means.)

But no, the point of a good model is not to simplify things so that a single brain can grasp it. �A good model will be able to provide both summaries simple enough for a single brain to cope with and simultaneously hold the detail that drives a development department of 1000+ people (not uncommon at a phone company).

Yes, maybe a single model for a department is plausible, depending on what the thousand people do and how many of them are doing unique work (having 10,000 ditchdiggers is not equivalent to having 10,000 C*Os). �But not an enterprise of typical size and complexity. �($EMPLOYER and its corporate parent together have about 30,000 employees, almost all knowledge workers.)

--�
GMail doesn't have rotating .sigs, but you can see mine at http://www.ccil.org/~cowan/signatures