[
Lists Home |
Date Index |
Thread Index
]
On Fri, 05 Nov 2004 16:06:10 +0100, Burak Emir <burak.emir@epfl.ch> wrote:
>
> Peter Hunsberger wrote:
>
> >
> >Let me put it this way, if someone needs an XML schema we can generate
> >one. In this particular application for 99% of the current needs we
> >really don't need an XML schema at all. That will change as things
> >open up across more organizational boundaries.
> >
> As I said before, half of your application sits across organizational
> boundaries...
>
> Although it is a use case for the one who publishes the data, I am not
> sure whether there is a way to write a program that reacts to such a
> schema change and adapt its behavior automatically.
>
I don't know. It's relatively easy to do a delta between two schema.
Whether one can automate any understanding of the delta is another
question. At least in our case we have a sort of controlled way of
creating those deltas. Simple Eg. within a major version we know no
new types will be introduced. (Across major version boundaries just
about anything goes and the problem is much harder.)
>
> >>Where does that leave the receiver of your data? Two options
> >>
> >>1) Either, he cannot rely on any schema, because it may be subject to
> >>complete change.
> >>2) Or, the schema changes are actually very very restricted to a few
> >>backwards-compatible details.
> >>
> >>Assuming the latter, I start seeing things clearer now, namely that if
> >>you add a new complex type by derivation, you are effectively building a
> >>new schema, hence there is indeed a new to build new schemas if it is
> >>possible to "continuously specialize".
> >>
> >>Does this cover your requirement? If no, can you give a concrete example
> >>like the one above?
> >>
> >
> >Not really, the dynamic generation occurs at well defined points: the
> >introduction of a new clinical trial or the revision of a medical
> >protocol.
> >
> This discussion is on the process level (interacting with humans), where
> my initial question was on the level of interacting software.
I'm not sure I can find the boundaries yet. We're still trying to get
a handle on the human process so that we can start to automate the
software interaction.
>
> It seems to be a case of schema evolution.
>
> <snip/>
Evolution might be a little generous since people tend to assume
evolution moves things forward. In our case a lot of the individual
schema variations add little to our overall understanding of the
problem domain. I'd liken it more to a schema (rules based) assembly
line; pour in meta-data at the top and stamp out schemas at the
bottom...
> >>I am aware of XML Schema pitfalls that prevent typed programming
> >>languages (e.g. XSLT, XQuery) from using the specialized data, yet it's
> >>hard to really grasp the need for "continuous
> >>specialization/extension/adaptation".
> >>
> >
> >I think in some ways it's part of the problem domain: we're doing
> >research, by definition we don't have well defined business rules that
> >can be evenly applied across all of the researchers. None-the-less
> >the researchers will wish to exchange data with each other in some
> >well defined way.
> >
> The only constant thing is change, also business rules are not cast in
> stone.
Sure, it's just a matter of scale: if a business changes it's rules
too rapidly it will flounder. If research changes it's rules too
slowly it will fail.
>
> >Instead of proceeding top down with business rules to schema we have
> >to build many possible solutions and dynamically search the solution
> >space to see what fits at any given moment. In a way it's a recursive
> >data mining project to find what schema works to describe the data.
> >Alternatively, perhaps it's a genetic algorithm for determining the
> >fitness of the schema to the data. (Both of those characterizations
> >are unfair, we actually have a better understanding of the data than
> >they imply.)
> >
> Both data mining and genetic algorithms talk on machines, you add humans
> to the equation.
We're trying to figure out how to get the humans out of as many parts
of the process as we can. There's a long term vision that one should
be able to automate the capture of a protocol description and spit out
the metadata to create all the necessary data management components
from data entry screens to the data marts and the ad-hoc query
interfaces automatically. It's still a research project.... :-)
> The point I tried to make was more or less that if you generate a schema
> dynamically, then humans have to rewrite software. Meaning that the old
> software will not work. Your problem seems way beyond, you never claimed
> that old software will work.
I'm not sure all old software will fail, but yes I agree, I never
claimed it would work either...
> >>>>
> >>>Yes and no. We have a meta-schema. It's so abstract and so
> >>>generalized that it's difficult to use for specific instance data.
> >>>The problem is, understanding of the schema is often local to the
> >>>schema writer. Not everyone "gets" 5th normal form, 5th normal form
> >>>doesn't work when the data hit's the data warehouse.
> >>>
> >>Does it happen that you need to change that one as well?
> >>
> >>Or is it a "parameterized" schema (like the Java generics)?
> >
> >It is largely a parameterized schema though it is still being revised
> >as we figure out what works best. The biggest changes are a constant
> >evolution to make it more granular. It's becoming less and less like
> >a conventional relational database schema (not that it ever was) and
> >more and more like a graph management system.
> >
> [OT] sounds like tricky stuff. Reminds me of a "professor for software
> engineering" whose only fascinations were ADA, Mercedes-Benz (as an ever
> repeating example of plain old industry in need for new software) and
> general graph replacement systems. I would never spend my time on a
> general approach to graph systems. For special purpose they can make a
> lot sense.
It's fascinating stuff. When we started to build this system many
"experts" told us we could not succeed. None-the-less we've met our
original goals and are now expanding the scope of the system. If I
had time I could write several books on what we've learned. In the
mean time, interacting here on xml-dev is at least providing me with
some clues on how to talk about what we've done and giving me clues on
where we need to go....
> >>>>What is a use case for dynamically generated schemas?
> >>>>
> >>>For one, you need different schema for different stages in the life of
> >>>the data. I know of no technology that lets you adequately describe
> >>>all possible transformations of the schema over time from within the
> >>>schema itself. As a specific example (discussed previously on the
> >>>list), you need a way to match versions of the schema to work flow.
> >>>
> >>In my understanding of the problem, this drifts away from "dynamical
> >>generation". Schema evolution (or just backwards-incompatible change)
> >>makes configuration management, versioning, and many things necessary.
> >>
> >>But having a meta schema and generating schemas is of no use for the
> >>problem at hand, because the receiver of your data cannot write software
> >>that deals with the meta schema, and hence with all versions of the schema.
> >
> >I guess this depends on your perspective: are the schema the starting
> >point or the end point? Do you negotiate from the schema or do you
> >document the negotiations with the schema? If it's the latter, how do
> >you model and document before you have the schema?
> >
> Negociations would mean "reconfiguration", and I precisely doubt that
> such a thing is possible (in absence of Meta-XSLT:-)
Re; "meta-XSLT", that's why having XSLT and schema as XML comes in handy... ;-)
>
> >If you have a good system for capturing the modelling and
> >documentation when you are working at the business knowledge capture
> >level then the schema can become an after the fact documentation
> >artifact. Yes, you still need version management, but the audit trail
> >that documents the negotiations isn't based on the schema, it's
> >external to it (and yes, maybe you have a schema for exchanging that
> >data also).
> >
> Surely, schemas do evolve, and having a documentation artefact is better
> than having none.
>
> What I get out of the description is that probably no schema language
> and no fixed program would help here.
I'd love to have a better way to model everything. If you can take in
the the abstract 5th normal form relational database, the graph
descriptions of the structures it stores, the UML OO models for the
Java that manipulates this data and the XML schema that captures any
given data instance you can get the complete picture of the system.
When we can find new developers who are comfortable with the whole
mess it usually takes at least 3 months before they are at all
productive.
> >>>>Why does one need to use XSL for it ?
> >>>>
> >>>You don't, but in our case, we've got about 8 different pieces of
> >>>source metadata that have to be combined and transformed in order to
> >>>derive a specific schema. XSL is the best match to the problem I know
> >>>of.
> >>>
> >>Unless I have misunderstood, I think your problem seems rather
> >>different, because you could also get away with not generating any
> >>schema at all, if it can change it unanticipated ways. Your problem and
> >>its solution (which may be elegant) does not take receivers into account
> >>- they may have to hand patch their code to deal with the new data.
> >
> >The changes are anticipated, they occur at well document points in the
> >life cycle of the protocol. When the changes happen the receivers do
> >indeed have to change the systems that accept the data. We're working
> >on ways to automate the process. The solutions are, in part, based on
> >the exchange of schemas that document the changes... :-)
> >
> To automate "fixing a program for the new schema", this is precisely
> what I think does not exist anywhere.
>
> More specifically, even a statically typed bunch of XSLT stylesheet and
> XQuery programs cannot deal with a dynamical schema change.
I'm not sure how far we'll get here or even how far it is necessary to
go. For a major portion of the code that handles the delta's between
schema changes we've moved from XSLT to Java. That's more because
we're moving to a real time data warehouse and we want event driven
transactional handling than a language issue.
If we had the equivalent of a continuous XML message stream and the
tools to handle it, a sort of XSLT would work. Basically a document
without start or end, just more SAX event's...
--
Peter Hunsberger
|