Lists Home |
Date Index |
On Sun, 12 Dec 2004 14:53:49 -0500, Roger L. Costello
> ISSUE - NATURE OF XML VOCABULARIES FOR LARGE SYSTEMS
> I identify three philosophically different approaches to the creation of
> an XML vocabulary for a large system:
> a. Create multiple, simple XML vocabularies.
> b. Create a single, simple XML vocabulary that is used in multiple ways.
> c. Create a single, large, complex XML vocabulary.
> Have you implemented a large system? Have you created an XML vocabulary for
> a large system? Which of the above three approaches did you take? I am
> particularly interested in hearing from people who have used simple XML
> vocabularies [approach (a) or (b)] to achieve all the data complexities
> in a large system.
I guess it depends on how you define large? We are building out a
single metadata driven system to collect data across multiple clinical
trials and protocols and related areas of interest (eg. tissue sample
tracking, international tumor registries). So far we've built out
about 1600 screens across about 30 different service areas (protocols,
whatever) tracking over 7500 data points across over 450 collections
of data points. (A collection roughly corresponds to a table in a
traditional relational database and there may be several screens for
the same collection of data points, no service area uses all the
collections defined within the system).
Our approach comes down to a cross between a), b) and c).... We have
about 6 relatively simple vocabularies that are used in multiple ways.
However, our metadata vocabulary is an oddball. If you count elements
it is large (on the order of 7500 + 450 definitions). However all of
these elements are built by extending a master vocabulary that is
essentially 6 or so main definitions and another (relatively static)
metadata table that defines perhaps another 200 primary relationships
within the system (which everything else must extend). For example,
at the simplest level we have definitions of "collection" and
"object". A instance of a collection might be a "therapy" which has
well defined relationships to diagnosis and protocol (among other
things). An instance of a therapy might be "surgery" or "radiation",
containing 15 to 35 objects. The business analysts deal only with the
high level abstract concepts and the system hides the details except
on the rare occasions we have to generate a schema for external
exchange purposes (in which case the schema is generated from the
metadata specific to the needs at the time).
We also have our own superset of Schematron for validation that
enables a validation editor GUI. Don't know how Scheamtron fits in
your large/small vocabulary spectrum?
Over all, now that I know what I'm doing, for our system, I lean
towards a) where "multiple" is a number less than 10. Use
abstraction, but keep the abstractions to terms that are familiar
within the broad domain (eg, screen layout, validation, business
areas). Approaches like RDF also make sense for gluing relationships
together and a very simple model for relationship description is
something I'm trying to come to grips with: how far to you want to go
in reducing everything to just another relationship before the
metadata looses it's applicability as a domain specific model, and
thus becomes difficult to model with?