Lists Home |
Date Index |
On Tue, 14 Dec 2004 16:16:56 -0500, Roger L. Costello
> Note: This is a continuation of the thread:
> XML Vocabularies for Large Systems - 3 Philosophically Different
> I have changed the title to reflect the narrowed focus.
> My goal is for us (the xml-dev group) to collectively define a systematic
> approach to using simple XML vocabularies to implement large (complex)
> Yesterday Len and Peter outlined two strategies. I would like to flesh out
> their ideas.
<snip>some discussion on what Len's approach might look like</snip>
> THE PETER HUNSBERGER APPROACH
> In the above Invoice example tags that are specific to postal addresses and
> books were used. A disadvantage is that many domain-specific simple
> vocabularies must be created.
> Peter's approach is to provide a "generic set of tags", coupled with a rich
> set of ways to relate the generic tags.
> Below I have attempted to define the Invoice using generic tags coupled with
> "relationship tags". Obviously I don't know what I am doing. Peter, would
> you fix this please?
> <Collection id="RLC">
> <Object>Roger L. Costello</Object>
> <Object>38 Boylston St.</Object>
> <Collection id="Bach">
> <Object>Richard Bach</Object>
> <Object>Dell Publishing Co.</Object>
> <Object href="RLC"/>
> <Relation>Purchased By</Relation>
> <Object href="Bach">
We actulaly have a base model that sort of looks like this, many more
attributes but you have the basic idea. However, before commenting on
this let me make a couple of observations:
1) if you're building a large complex system it has to be layered in
order to achieve scalability in design, performance, maintenance, etc.
Each layer has to be as loosely coupled to the other as possible but
each has to trust the other to do it's job properly. There can't be a
single repository of all business knowledge if you have layered system
even if all layers use the same metadata. (Eg, the front end has one
understanding what "address" means and the back end another.)
2) For such a system you have to be careful to separate the concerns
of external data exchange from internal data exchange even if you are
using the same technologies for both. XML provides a wonderful way to
isolate the layers of a system, but if for example, you attempt Schema
validation at each layer you're going to have a non-performant system.
Again, you have to trust each layer to do it's job.
Given this, the model:
<collection title="Address" type="address">
<object title="Line 1" type="addr"/>
<object title="Line 2" type="addr"/>
<object title="City" type="city"/>
may have uses as an internal abstraction, but be less useful for
external exchange. We have a master model sort of like this which
everyone understands and forms a basis for modelling and generic user
Actual instance XML looks more like:
<address title="Address" type="collection">
<addr1 title="Line 1" type="String"/>
<addr2 title="Line 2" type="String"/>
<city title="City" type="String"/>
At this point there's a mapping between the two versions but the
second version carries some extra metadata that might otherwise be
coded in a Schema. For internal exchange purposes this means we don't
need to transport the Schema (and the instance) between layers. We
don't generate a Schema unless the instance data is travelling
Thus, if I was to try and summarize our approach to this point in the thread:
1) Give the users a simple model so that they can easily experiment
with understanding the semantics of the business domain:
Use a very simple model as the basis for modelling more complex
instances. Collection/object/relationship is about as generic as it
gets. However, don't expect people to use RDF, don't force complex
semantics (or syntax!) on them, enable generic modelling tools at this
2) Use the emergent semantics as internal metadata:
Extend the generic model with specific instances for internal exchange
purposes, but don''t make things heavy weight. Trust the layers of
the system to throw the DateFormat exceptions etc. and to transport
such up the chain (if the validation rules were improperly specified
by the business analysts at modelling time).
3) Commit to a specific version of semantics only as needed:
Use the heavyweight technologies when you need to travel outside the
boundaries of the system.
Remember, we're talking large complex systems here, not general
purpose generic systems. There is a contextually relevant knowledge
domain for such systems and this knowledge is intentionally coded into
the system for performance purposes (or because that's the only way
your developers know how to do it, but that's a different issue). As
a result, there is no need to replicate this knowledge in the XML
instances used internally within the system.
Following the above guidelines, the interesting question becomes the
xml-dev perma-thread: how generic can such complex processing systems
become? Where are the inflection points when usage of XML Schema
(pick your favorite XML technology) tips the usability or
maintainability or performance of the system over the edge? (More
like the mother of all Computer Science Masters thesis...)
Our system is a very generic system that models very domain specific
metadata. It can be generically extended to any domain but for many
domains the generic modelling capabilities are not worth the
processing power required. This is largely because these domains have
long been the focus of automation and performant domain specific
complex systems are approaching commodity availability if they are not
I'll try to find time for other comments, but they may be brief...