Re: [xml-dev] Victory has been declared in the schema wars ...

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Rick Jelliffe <rjelliffe@allette.com.au>
To: mc@xegesis.org
Date: Wed, 29 Nov 2006 16:34:47 +1100

Michael Champion wrote:
>  
> Speaking of XSD 1.1 and Schematron, what do others think about their
> approach of defining their own constraint language based on Schematron
> concepts rather than taking an external dependency on Schematron? 
Well, I think the idea of an XPath-based constraint language in general 
is a no-brainer.

People who won't adopt a standard because it comes from ISO have 
disappeared up their
own arseholes, they are too far from rational argument to worry about 
(or W3C specs
for that matter.)  But do such toroidal people actually exist?  Lets 
assume they don't :-)

People who want to have an XPath-based constraint language that has 
significantly
different semantics or operations from Schematron certainly shouldn't 
use Schematron.

For example, W3C XSD WG did the right thing by not adopting Schematron
(indeed, this was a point I made to them), since
one of the essenses of Schematron is the natural language assertion: the 
grammar-
based schema languages all have the fundamental problem that they don't have
any mechanism for effectively communicating to humans diagnostics expressed
in terms of the problem domain and data graph: they can only give 
generic messages in
terms of grammar theory, the XML tree and the specific element names. One
consequence of this is that as soon as the XML is hidden by some interface,
the canned validation messages (which are given in terms of the XML and 
grammar)
become incomprehensible.

"Hiding the XML" often has the unintended consequence of making 
validation messages
incomprehesible too: by remapping diagnostics, or often by re-inventing the
validation wheel and doing it in the User Interface code. Extra work, 
double handling.

Schematron is really the only standard schema language that has actually 
made
this issues its core: how do you
go from analyst-specified bullet-point specs of the rules to executable 
code; how
can that executable code generate information expressed in domain terms
rather than markup terms with dynamic content, icons, etc suitable for being
displayed in a user interface, yet user-interface neutral.

I regularly see re-inventions of Schematron. I saw another new one just 
yesterday.
The only one I have seen that was technically superior in some aspect 
was the
XCSL (XML Constraint Specification Language) from Portugal: so ISO 
Schematron
adopted their <let> variable (with their blessing) that is trivially 
implementable
in XSLT of course.

People inevitably miss out on the two-part context/test split  that 
Schematron has:
this, which allows grouping of constraints (did anyone say "type"?), 
potential
implementation improvements, easier to understand XPaths, and removes the
need for a for-each construct (in XPath 1 at least) for all real cases I 
have seen.

Another thing is phases. Once issue with path-based constraint languages is
that you can easily end up with a storm of information, because in effect
validation happens in parallel: it doesn't stop at the first error. So 
schematron
allows grouping of patterns into phases, to allow progressive validation:
lets validate all tables only, or lets validate that metadata exists as 
the last
step even though it comes first in the doument, or lets check for typos in
namespaces first before we try to validate the elements, or lets now
download the XML data retrieved from a link in the document from
a DBMS URL.

There is one XPath-based constraint language that goes beyond Schematron;
XLinkIt is a commercial product that did this, by using a much more advanced
logic. I worked on expert systems in the early 90s, and I appreciate the 
higher
order logics can be used to build really powerful systems, but I also 
appreciate
that for sheer implementability and problem-solving, simple if-then style
predicate logic covers so many bases it is hard justify not erring on 
the side
of simplicity.

So I am happy to defend the design of Schematron: I think it (largely 
under the
influence of its users and developers) has matured into a design and 
standard that has not
been bettered in the dozens of imitators. And, more than that, I think 
it addresses
incredibly important issues (diagnostics, phases, etc) that expose the 
other schema
languages and designs as being obese or underfed toys designed without 
consideration
for the central position of humans in the chain.

Here's my quick test. I was one of the ones who pushed for XSD to have 
determinate
outcomes: for example, the well-enumerated list of errors. It is a good 
thing. However,
anyone who thinks that these kind of messages are remotely suitable for 
end-users,
especially after being mediated through a user interface is fooling 
themselves.

So that is one reason why I think the RELAX NG versus XSD debate is largely
flummery: of course XSD should be refactored into a RELAX NG-equivalent
core and a type-annotating outside layer, the RELAX NG people are correct
in saying that grammar-based schema languages can be refactored without 
removing any
capability (or changing syntax necessarily).  But just adding XPath 
assertions to XSD (or
RELAX NG) , though good and better for modelling, misses the fundamental
diagnostical inadequacy of current schema languages. 

Adopting RELAX NG will help people with problems relating to XSD's lack of
power in several areas. Though XSD 1.1 indeed does take a few steps 
towards RELAX
NG, but still not nearly enough: indeed they take some steps back....for 
example, it is
crazy for XSD 1.1 to just add weany, weedy and weakie (vini, vidi, vici) 
XPath subset
constraints instead of allowing attributes in content models such as 
RELAX NG has:
it is a hack, designed to be grafted onto existing products with minimum 
pain.

But adopting RELAX NG won't alter the fundamental diagnostics issue. Nor may
taking on your own whizzbang home-made XPath-based constraint language, 
because
engineers typically get caught up with the issue of how to add Xpaths to 
types, rather
than the issue of how to make it easy and direct to express constraints 
and get diagnostics. 
The silicon- or character- focused engineers are the problem, not the 
solution. Its not datahead
versus dochead that drives Schematron: its the user experience, 
usability, user-friendliness,
user-centricism,  interface-ability (is that a word?), non-technical 
user control,  minimization
of concepts, consolidation of skills (lay people can more easily learn 
paths than grammars,
let alone UPA.)

If you're looking at schema languages, I think user-friendly diagnostics 
is the big picture issue that
provides a different way to judge both XSD (and RELAX NG.) The other 
nice thing about
Schematron, is that it can marketed as solving a different set of 
problems than "schema" languages;
XSD can fall back to being a niche technology for WS-* and data-binding 
hidden from users
and integrators.

Cheers
Rick Jelliffe

References:
- RE: [xml-dev] Victory has been declared in the schema wars ...
  - From: "Len Bullard" <cbullard@hiwaay.net>
- RE: [xml-dev] Victory has been declared in the schema wars ...
  - From: "Michael Champion" <mc@xegesis.org>
- Re: [xml-dev] Victory has been declared in the schema wars ...
  - From: Rick Jelliffe <rjelliffe@allette.com.au>
- RE: [xml-dev] Victory has been declared in the schema wars ...
  - From: "Michael Champion" <mc@xegesis.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]