Lists Home |
Date Index |
Yours is a very common position to be in.
There are all sorts of intermediate kinds of partial validation
possible and useful. The choice isn't between all or nothing.
For example, you could make a version of the standard schema to
redefine elements so that you only validate datatypes: complex
content would just have some wildcarded anything-goes content
What this would give you is a system that is liberal in what it
accepts. This is certainly better than no validation.
Another way to look at the problem is from the perspective of
test-driven development. You can validate everything initially,
until your feeds have proven themselves, then reduce to sampling
using the standard statistical practise. Or look at it as
an opportunistic thing: even if your servers are too slow to
cope with validation during peak period, you could enable it
at off-peak times.
Another approach entirely is to express your business rules
in Schematron, and validate using that instead of the
standard XML Schema. This allows you to only check the things
you are interested in, cope with partial and incomplete
documents-in-progress (compared to the standard schemas) but also to
document what you are interested
and also to check for things you positively don't want in your
data: this is a lot more powerful than type derivation in this
Fraser Goffin said:
> Thanks Greg, some interesting points to consider.
> I am mostly concerned with B2B. One of the issues I'm wrestling with is
> a. the service contract is defined by an external standards body (we are
> one implementer).
> b. the data model that underpins the service operations are defined using
> XML schema and these reflect the broad business semantics for each
> (as agreed by a panel of contributors from our industry sector).
> c. our business rules (in terms of what data content/structural
> that would be acceptable) are less strict than the XML schema specifies
> example we may be tolerant of missing data).
> So I guess I was considering whether we should validate according to our
> internal business rules rather than that of the externally defined
> even when this can mean that a message received could be schema invalid
> (according to the industry standard definition) ?