xml-dev - Re: [xml-dev] dynamically generated XML Schema?! Re: [xml-dev] R: [xml-d

Re: [xml-dev] dynamically generated XML Schema?! Re: [xml-dev] R: [xml-d

[ Lists Home | Date Index | Thread Index ]

To: Burak Emir <burak.emir@epfl.ch>
Subject: Re: [xml-dev] dynamically generated XML Schema?! Re: [xml-dev] R: [xml-dev] Number of active public XML schemas
From: Peter Hunsberger <peter.hunsberger@gmail.com>
Date: Thu, 4 Nov 2004 09:19:22 -0600
Cc: XML Developers List <xml-dev@lists.xml.org>
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=tBhUYJupEwtObq9+7am/EWF/gAogiDhqRaqz7Hyzigibs3rwBv1VHN/qzuUkkvmxe/BAmRtCco1TI94BqQFuym16/g5B/75rKWNCxPLR4OiEgDCGDyAP1OpfxIyOBavUs+NW7omTe2ydxBB4nnUXiyw0scQy8DjQ05CKVBWYjL8=
In-reply-to: <4189EB28.3060508@epfl.ch>
References: <F4AA32DE20F3D211BCB90050044B6DD0B7F761@CSB> <4188B213.50903@epfl.ch> <cc159a4a04110306405b43280c@mail.gmail.com> <4189EB28.3060508@epfl.ch>
Reply-to: Peter Hunsberger <peter.hunsberger@gmail.com>

On Thu, 04 Nov 2004 09:41:12 +0100, Burak Emir <burak.emir@epfl.ch> wrote:
> 
> Peter Hunsberger wrote:
> 
> >Burak Emir <burak.emir@epfl.ch> asks:
> >
<snip>XML syntax discussion and related</snip>
> >>>
> >>One can of course endlessly discuss about syntax, but I have never
> >>understood the obsessiveness of marking up descriptions of XML data in XML.
> >>
> >>Who needs to dynamically generate schemas?
> >
> >Umm, we do.
> >
> Are you sure? :-)

Let me put it this way, if someone needs an XML schema we can generate
one.  In this particular application for 99% of the current needs we
really don't need an XML schema at all.  That will change as things
open up across more organizational boundaries.

> 
> >>The whole point of schemas is
> >>to be a widespread, well understood description of instances.
> >>
> >
> >In our cases we have a lot of metadata described in a relational
> >database.  There are customizations of that metdata that select
> >specific pieces based on the authorizations of the user and the usage
> >context of the meta data.  The only time we need a schema is for the
> >description of a piece of instance data that is travelling beyond the
> >boundaries of the system, so we generate the schema as we need it.
> >
> >This may sound like a problem of not having a powerful enough schema
> >language and in a way it is.  However, my general philosophy is that I
> >will generate no schema before it's time...
> >
> >
> Ok, using schemas to describe the format of the data that is going out,
> from descriptions in a relational database.
> 
> If I put this a bit more concrete, I have a bugtracking system, and a
> bug-report has a field "product" which is an enumeration.
> 
> Now, when there is a new product, the enumeration changes (somebody
> updates the database). One generates a new schema.
> 
> But this is a bit one way: The one who generates the schema changes his
> data at free will (maybe the product field even disappears)?
> 
> Where does that leave the receiver of your data? Two options
> 
> 1) Either, he cannot rely on any schema, because it may be subject to
> complete change.
> 2) Or, the schema changes are actually very very restricted to a few
> backwards-compatible details.
> 
> Assuming the latter, I start seeing things clearer now, namely that if
> you add a new complex type by derivation, you are effectively building a
> new schema, hence there is indeed a new to build new schemas if it is
> possible to "continuously specialize".
> 
> Does this cover your requirement? If no, can you give a concrete example
> like the one above?

Not really, the dynamic generation occurs at well defined points: the
introduction of a new clinical trial or the revision of a medical
protocol.

Say a researcher wishes to revise a protocol to capture some new
information.  We may already have metadata descriptions existing
elsewhere that describe this particular information (if not, they are
created by business analysts who know nothing about XML).  Based on
this metadata, the researchers current authorizations and the context
in which the researcher wishes to use the new information we can
generate a new schema.  This schema will be consistent across all
matching instance data until the protocol is revised again.  The old
version of the schema will be retained at that point and can be used
to audit and validate previous versions of the data.

Any schema revision may not be completely backwards compatible,
sometimes information is no longer wanted. On some occasions data
changes format or type (the latter can be problematic and require
manual changes).  Data elements can go from not present in the model
at all to being required.  Making elements optional wouldn't work;
until they become an official part of the protocol they can't exist at
all.  The schema documents the current state of the protocol exactly. 
(Or rather it will in the future, at the moment the schema aren't as
precise as they need to be.)

In theory, the dynamically produced schema could serve as a basis for
negotiation.  In practice this negotiation is done using higher level
modelling, and business analyst facilitated face to face meetings with
the concerned parties. The schema is in some ways an after the fact
documentation artifact.  The fact is, XML schema and the tools for
handling them don't work at the level of modelling that is required
for our application. (Thus, as I say, I will generate no XML schema
before it's time.)

> 
> I am aware of XML Schema pitfalls that prevent typed programming
> languages (e.g. XSLT, XQuery) from using the specialized data, yet it's
> hard to really grasp the need for "continuous
> specialization/extension/adaptation".

I think in some ways it's part of the problem domain: we're doing
research, by definition we don't have well defined business rules that
can be evenly applied across all of the researchers.  None-the-less
the researchers will wish to exchange data with each other in some
well defined way.

Instead of proceeding top down with business rules to schema we have
to build many possible solutions and dynamically search the solution
space to see what fits at any given moment.  In a way it's a recursive
data mining project to find what schema works to describe the data.
Alternatively, perhaps it's a genetic algorithm for determining the
fitness of the schema to the data. (Both of those characterizations
are unfair, we actually have a better understanding of the data than
they imply.)

> ...
> <snip/>
> 
> >>Now one can dwell in discussion of hypothetical families of schemas, but
> >>for all my experience tells me about modelling, if you manage to
> >>understand what the common things are that make a bunch of schemas a
> >>family, then you can anticipate the extensibility you need, which
> >>removes completely the need for dynamic generation.
> >>
> >
> >Yes and no. We have a meta-schema.  It's so abstract and so
> >generalized that it's difficult to use for specific instance data.
> >The problem is, understanding of the schema is often local to the
> >schema writer.  Not everyone "gets" 5th normal form, 5th normal form
> >doesn't work when the data hit's the data warehouse.
> >
> Does it happen that you need to change that one as well?
> 
> Or is it a "parameterized" schema (like the Java generics)?

It is largely a parameterized schema though it is still being revised
as we figure out what works best.  The biggest changes are a constant
evolution to make it more granular.   It's becoming less and less like
a conventional relational database schema (not that it ever was) and
more and more like a graph management system.

> >>What is a use case for dynamically generated schemas?
> >
> >For one, you need different schema for different stages in the life of
> >the data. I know of no technology that lets you adequately describe
> >all possible transformations of the schema over time from within the
> >schema itself.  As a specific example (discussed previously on the
> >list),  you need a way to match versions of the schema to work flow.
> >
> In my understanding of the problem, this drifts away from "dynamical
> generation". Schema evolution (or just backwards-incompatible change)
> makes configuration management, versioning, and many things necessary.
> 
> But having a meta schema and generating schemas is of no use for the
> problem at hand, because the receiver of your data cannot write software
> that deals with the meta schema, and hence with all versions of the schema.

I guess this depends on your perspective: are the schema the starting
point or the end point?  Do you negotiate from the schema or do you
document the negotiations with the schema?  If it's the latter, how do
you model and document before you have the schema?

If you have a good system for capturing the modelling and
documentation when you are working at the business knowledge capture
level then the schema can become an after the fact documentation
artifact.  Yes, you still need version management, but the audit trail
that documents the negotiations isn't based on the schema, it's
external to it (and yes, maybe you have a schema for exchanging that
data also).

At the end of the day we end up with a Gene Therapy version of the
drugs schema and a Solid Tumor version of the drugs schema and they
may in turn have their own revision levels.  They have metadata in
common, but they also have completely different sets of elements
within them.

> >>Why does one need to use XSL for it ?
> >
> >You don't, but in our case, we've got about 8 different pieces of
> >source metadata that have to be combined and transformed in order to
> >derive a specific schema.  XSL is the best match to the problem I know
> >of.
> >
> Unless I have misunderstood, I think your problem seems rather
> different, because you could also get away with not generating any
> schema at all, if it can change it unanticipated ways. Your problem and
> its solution (which may be elegant) does not take receivers into account
> - they may have to hand  patch their code to deal with the new data.

The changes are anticipated, they occur at well document points in the
life cycle of the protocol.  When the changes happen the receivers do
indeed have to change the systems that accept the data.  We're working
on ways to automate the process. The solutions are, in part, based on
the exchange of schemas that document the changes... :-)

-- 
Peter Hunsberger

Follow-Ups:
- Re: [xml-dev] dynamically generated XML Schema?! Re: [xml-dev] R:[xml-dev] Number of active public XML schemas
  - From: Burak Emir <Burak.Emir@epfl.ch>

References:
- R: [xml-dev] Number of active public XML schemas
  - From: Chizzolini Stefano <chist@csb.it>
- dynamically generated XML Schema?! Re: [xml-dev] R: [xml-dev] Numberof active public XML schemas
  - From: Burak Emir <Burak.Emir@epfl.ch>
- Re: [xml-dev] dynamically generated XML Schema?! Re: [xml-dev] R: [xml-dev] Number of active public XML schemas
  - From: Peter Hunsberger <peter.hunsberger@gmail.com>
- Re: [xml-dev] dynamically generated XML Schema?! Re: [xml-dev] R:[xml-dev] Number of active public XML schemas
  - From: Burak Emir <Burak.Emir@epfl.ch>

Prev by Date: Re: [xml-dev] dynamically generated XML Schema?! Re: [xml-dev] R: [xml-dev] Number of active public XML schemas
Next by Date: using more then one schema within one XML
Previous by thread: Re: [xml-dev] dynamically generated XML Schema?! Re: [xml-dev] R:[xml-dev] Number of active public XML schemas
Next by thread: Re: [xml-dev] dynamically generated XML Schema?! Re: [xml-dev] R:[xml-dev] Number of active public XML schemas
Index(es):
- Date
- Thread