OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Cross document validation with Schematron - XML syntax for Xpath?

[ Lists Home | Date Index | Thread Index ]
  • To: <xml-dev@lists.xml.org>
  • Subject: Cross document validation with Schematron - XML syntax for Xpath?
  • From: "Hunsberger, Peter" <Peter.Hunsberger@stjude.org>
  • Date: Fri, 12 Mar 2004 11:20:53 -0600
  • Thread-index: AcQIVl/Vt1yM3YONTlGOjvwGFie1Lw==
  • Thread-topic: Cross document validation with Schematron - XML syntax for Xpath?

First some background: we have a large complex Web app that builds
1000's of different input forms from metadata descriptions in the form
of XML.  This XML comes from many different spots and describes global
metadata, user view specific metadata, authorizations and the current
data for a given screen.  XSLT transforms take this input XML smashes it
together into an abstract object mode and this is in turn forwarded on
to another XSLT that does presentation specific transformations (for the
Web app that means turning it into XHTML).  

The user does a standard HTTP POST back to us, and we get the request
parameters back as XML and run another XSLT transform and then a
Schematron transform to validate the input from the user.  If Schematron
throws an assert we detect that and recycle the input back through the
original loop with the appropriate error message otherwise we continue
on to the next screen.

The original metadata and the instance specific screens that are built
around them are built by business analysts using other screens (that are
in turn built by the same system).  In particular, we have a validation
editor where they describe the validation rules for the input to any
given screen.  These rules are one step removed from Schematron
statements; a simple transform turns them into Schematron. The main
reason for not specifying Schematron directly is so that the "validation
editor" can pick the rules into component pieces when a business
analysts wants to go back and edit an existing validation rule; we use
XML elements and attributes to build the Xpath, that way we don't have
to parse the Xpath (though we're probably going to go to XSLT for regex
support so I suppose parsing the Xpath with regex would be just about
the same work in the long term).

All this works pretty well, but for one issue which I will describe
shortly.  However, we now have a new requirement which is to be able to
validate across multiple documents.  We manage clinical research data,
so an example would be for someone to be able to specify that a surgery
date was after any protocol on study date, or that a surgery date is
after a particular instance of an protocol on study date.  In this case,
the data being validated is in the surgery document and the data it is
being validated against is in the protocol document.  (In reality all
this is pulled out of a database on the fly, but the mechanics of how
these documents are actually created should be more or less irrelevant
to the problem at hand?)

First issue:

Writing Schematron asserts can be non-intuitive for a business analyst.
Consider, for example, a document that reports many lab results.  We may
want to say that the ANC value is between 1000 and 10000. As a
Schematron assert it is essentially:

	not(*[local-name() ='ANC']) or ( result_val > 1000 and
result_val < 10000)

IE, for things that aren't ANC's we are ok, otherwise check the result
value.  The problem is that a business analyst just doesn't get the
"not(x) or" pattern, it might make sense to someone well versed in
Boolean logic and xpath, but even some of our more experienced
developers get confused on these rules.

Given this, and the requirement for cross document validation we'd like
to move the input to our validation process one more step away from
Schematron and find or create a language that can be used by the
business analysts to specify the validation rules in a manner that is a
little more natural to them.  For example:

	element = 'ANC' and result_val > 1000 and result_val < 10000

For Schematron generation that's pretty straight forward, however, more
importantly, we also need to be able to use this rule specification to
tell us how to generate the other document.  Considering my other
example, we want something like:

	*[local-name() = 'surgery.date'] > *[local-name() =
'protocol.on_study_date']

Or 

	*[local-name() = 'surgery.date']  > *[local-name() =
'protocol.on_study_date' and protocol.mnemonic = 'TOTXV']

We want to be able to parse this rule specification to find the fact
that we have to do a retrieval of all the protocol data that is in
context for this particular patient (or the protocol data in context
that has a mnemonic='TOTXV').  Essentially, I think what we need is an
XML syntax for xpath that we can turn back into real xpath or be easily
parsed so things other than xpath savvy processors can generate data
sets that match the xpath.  

We are running this all on top of Apache Cocoon with Saxon so we more or
less have any piece of XML or XSLT handling machinery we might need
available to us: protocol resolvers, any and all manner of schema, XSLT
in any version, Java classes, and even Java extensions for XSLT if
needed, though I'd rather stay away from those.

Sorry for the windy post, but finally the real questions: anyone know of
any "obvious" way to do this?  By obvious I mean some existing spec, or
best practice?  If not, any thoughts on what a good structure for our
artificial language that is going to be fed into Schematron and our
document retrieval process?   Am I missing something with respect to
Schematron?  Could we hook into some underlying part of an xpath parser
and gain are understanding of the xpath there instead of at the higher
level (and thus not need the XML syntax for xpath)? Other thoughts or
comments?

Peter Hunsberger






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS