OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Fallacies of Validation, version #2

[ Lists Home | Date Index | Thread Index ]
  • To: <xml-dev@lists.xml.org>
  • Subject: Fallacies of Validation, version #2
  • From: "Roger L. Costello" <costello@mitre.org>
  • Date: Thu, 26 Aug 2004 10:12:12 -0400
  • Thread-index: AcSLdq7Psj6tpBzOR4CHiOrY7Qd44Q==

Hi Folks,

Many thanks for all the outstanding comments!  Below I have updated the list
of fallacies (note 3 new fallacies) and elaborated upon the previous
fallacies, using the examples and information that you provided.  As always,
comments are very welcome.

Fallacies of Validation

1. Fallacy of "THE Schema"

2. Fallacy of Schema Locality

3. Fallacy of Requisite Validation

4. Fallacy of Validation as a Pass/Fail Operation

5. Fallacy of a Universal Validation Language

6. Fallacy of Closed System Validation

Let's examine each of these fallacies.

1. Fallacy of "THE Schema"

This fallacy was identified by Michael Kay:

> ... there's no harm in using XML Schema to check data 
> against the business rules, so long as you realize this 
> is *an* XML Schema, not *the* XML Schema. We need to stop 
> thinking that there can only be one schema.

Len Bullard made a similar statement:

> ... most fundamental errors are ... to consider only a single schema.

and at another point Len states:

> ... fall into the trap of thinking of THE schema and not 
> recognizing the system as a declarative ecosystem of schemas 
> and schema components.

Both Michael and Len are stating that in a system there should be numerous
schemas. This is a big mindshift for me. I admit being trapped into thinking
that there should be a single schema.

Len responded to my query to define "declarative ecosystem".  I think that
this term is a very important term and underlies much of what is presented
here. Here's what "declarative ecosystem" means: 

Every system lives within a world where there is a lot of variety, i.e.,
systems aren't islands.  For example, the Wal-Mart system must coexist with
its supplier systems, its distributor systems, and its retailer systems.
One can think of this system-of-systems as an "ecosystem".  Thus, the
Wal-Mart system resides in an ecosystem.  Each system within the ecosystem
has their own local requirements which are documented by their own
(declarative-based) schemas.  Thus, not only are there a bunch of systems
which must coexist, there are a bunch of schemas that must coexist.  This
ecosystem of schemas is a "declarative ecosystem".  [Len, have I accurately
defined the term?]

Oh, one more comment on declarative ecosystems.  Len made this remark which
I think is important:

> ... [if two systems are interoperating in a 
> closed environment then] it doesn't matter how
> singular or multiple they [the schemas] are;
> but when they are in an ecosystem, they typically
> overlap and exchange information, and adapt as a
> result.

[Mindblowing ideas Len!  Schemas exchanging information and adapting.  Wow!]

Okay, now back to the fallacy of "THE schema" ...

Many examples were provided to demonstrate the value of multiple
validations:

Len provided an example of a distributed reporting system:

> Look at any large reporting system.  You can build 
> that up a large schema but given local variations, 
> do you have sufficient power/force/authority to 
> make them stick or will you be constantly adjusting 
> them, loosening them, strengthening them, and how 
> will you know which is the right thing to so? 

I would like to elaborate further on this.  Suppose that a company has an
office in London, Hong Kong, and Sydney.  They all report to the main office
in New York.  With such a geographically dispersed collection of offices, it
is easy to imagine that there will be local variations.  There will probably
be some data that is common to all the offices (Rick Jelliffe calls the
constraints on this type of data invariant constraints).  Then there will be
locale-specific data (variant constraints).  So, it doesn't seem reasonable
to assume that a single reporting schema would suffice for this
geographically-dispersed organization.  [Len, have I captured your example
accurately?]

Mary Holstege and Michael Kay gave examples of the value of multiple schemas
in a workflow environment:

From Mary Holstege:

> ... suppose all you care about in some phase of
> processing is picking up the IDs in a document. 
> Then you define a minimal schema where everything 
> is open with the appropriate ID attributes. Maybe
> you're going to generate an index. In another 
> phase of processing all you care about is checking 
> that dates are in the right date range. So you have 
> another minimal schema that only pays attention to dates.

From Michael Kay:

> One example I am thinking of is where a document is 
> gradually built up in the course of a workflow. At 
> each stage in the workflow the validation constraints 
> are different. You can think of each schema as a filter 
> that allows the document to proceed to the next stage of 
> processing.

Finally, Len made a good statement:

> Sometimes, a single schema suffices for the whole 
> system.  Sometimes, you needs lots of little ones.

2. Fallacy of Schema Locality

Len identified this fallacy:

> ... most fundamental errors are to consider schemas only at the external
system junctions ...

To be honest, I am not clear on this fallacy.  I believe that what is being
said is this: if you build a system with local customs hardcoded into it,
but then deploy it into a global environment ... that's a real bad mistake.
An example of this is Michael Kay's example of interacting with an online
U.S. service that insisted on users providing a state code.  Clearly, the
online service was built with local customs hardcoded, but then deployed in
a global environment.

Here's a comment that Len made on this fallacy:

> The problem of locale is that it is declared 
> locally but might require global management.

Can someone tell me if I have captured this fallacy accurately?

3. Fallacy of Requisite Validation

Yesterday Michael Kay made a very compelling statement with regards to
whether validation should be done at all in certain situations. Michael was
responding to the example of an online service validating a user's address.
Here's what Michael said about the online service's insistence on validating
the user's address:

> The strategy (validating the user's address) assumes that 
> you know better than your customers what constitutes a 
> valid address. Let's face it, you don't, and you never 
> will. A much better strategy is to let them (the user) express 
> their address in their own terms. After all, that's what they 
> do in old-fashioned paper correspondence, and it seems 
> to work quite well.

Michael argues very effectively that in this situation it makes no sense to
do any validation at all!

4. Fallacy of Validation as a Pass/Fail Operation

Mary Holstege identified this fallacy.  Here's what she said:

> [Many people think that validation is a pass/fail operation.]
> Not so, although lots of people are still stuck in that way 
> of thinking, including, alas, a lot of the vendors.
> The schema design goes to great pains to make it possible to 
> do things like this, for example: validate a document against 
> a tight schema, and then ask questions of the result such as 
> "show me all the item counts that failed validation because they
> were too high"

A quick scan of Rick Jelliffe's latest message indicates that he disagrees
with Mary on this fallacy.  Perhaps some more discussion is in order?

5. Fallacy of a Universal Validation Language

Dave Pawson identified this fallacy.  He noted that the Atom specification
cannot be validated using a single technology:

> From [Atom, version] 0.3 onwards it's not been possible 
> to validate an instance against a single schema, not 
> even Relax NG. They need a mix of Schema and 'other' 
> processing before being given a clean bill of health.

6. Fallacy of Closed System Validation

This fallacy was identified by Len a long time ago.  I still remember
something he said one day when discussing closed versus open systems,
"Systems leak.  There's no such thing as a closed system".  This is an
important comment.  Many people imagine that they can create a monolithic,
invariant schema because "there's just me and my well-known trading
partners".  This statement fails to recognize the existence of a changing
world; more precisely, a changing ecosystem.

One last thing - my favorite term of the day (can you guess?), and my
favorite quotes of the day.

Favorite Term: Declarative Ecosystem

Favorite Quotes:

1. [From Len] "I can't separate social rules from engineering fundamentals.
I apply engineering fundamentals to implement social systems."

2. [Also from Len] "Even if one thinks it easier to manage a single schema,
command and control in adaptive systems is distributed.  A schema is
control."  [Wow!  I never thought of schemas in this fashion.  Great stuff
Len!]

Thanks again everyone!  Please keep the comments coming.

/Roger

 

 







 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS