Re: [xml-dev] A single, all-encompassing data validation language

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] A single, all-encompassing data validation language - good orbad for the marketplace?
From: noah_mendelsohn@us.ibm.com
To: "Costello, Roger L." <costello@mitre.org>
Date: Thu, 2 Aug 2007 17:05:54 -0400
Whoa.  First a caveat, I'm speaking very much for myself here, not for the 
Schema WG. 

While I think the questions you're asking here are interesting, I think 
you are somewhat mischaracterizing the goals of the assertions work 
proposed for Schema 1.1.   You seem to be implying that the goal is to 
replace rule-based languages like Schematron, or to discourage the use of 
pipelines.  That's not how I see it.  I think everyone involved agrees 
that there are many important and useful "rules" that you either won't be 
able to express in Schema 1.1, or at least that you won't be able to 
express as conveniently and naturally as you can in, say, Schematron.

So what are the new assertions in W3C XML Schema (XSD) version 1.1 trying 
to do?  XSD is a type-based language.  You can declare two elements to be 
integers, and you then know that the constraints on their contents are the 
same.  Similarly you can declare complex types, which allow for element 
and attribute children, and there too, any two elements with the same type 
have the same content contstraints.  So:


        <element name="width" type="measurement"/>
        <element name="height" type="measurement"/>

tells you that the content rules for <width> elements and <height> 
elements are the same, regardless of where those elements appear.  This 
gives a very natural bindng to program structures and databases;  in a 
language like Java you can map the type "measurement" to a class, and each 
instance of a <width> or <height> element maps to an instance of that 
class.  Those are then likely to be assignment compatible.  If in your 
Java code you want to say "tableWidth = roomWidth" it probably works. 
XQuery leverages the fact that types have this clean, 
context-indepedendent semantic.

In Schema 1.0, most of the constraints you could express for a complexType 
were in the former of a grammar for the child element, and occurrence 
rules for attributes.  In schema 1.1, we add additionally the ability to 
apply XPath predicates >to the subtree governed by the type<.  That's 
architecturally appropriate for Schema (IMO), but much more limited than 
what Schematron allows.  The XPath constraints are thus integrated with 
the type system, and they preserve the architectural invariant that two 
instances of the same type are governed by the same constraints.  There's 
(intentionally) no way to say "a meausrement must obey these rules, unless 
the element in question has as its 3rd left sibling an element named 
<tangerine>" .  What you can do is require that all measurements be in a 
certain range, that the integer value of a measurement be less than an 
@maxval attribute, etc.  Types remain context independent (I'll ignore XSD 
identity constraints, which I've always thought were a mistake precisely 
because they are not integrated with the type system.  Anyway...) 

Furthermore, because it's integrated with type-based validity checking, 
assertion failures in Schema 1.1 are integrated with the reporting of 
other validity failures.  I believe this is important and useful, 
particular in the cases where validity checking is invoked through an API 
by some container application (as opposed to being invoked interactively 
by a user reading error messages on a screen.)

All this represents a tradeoff.  It does less than what Schematron does. 
What it provides, is well integrated with the type and validity reporting 
system of the language.  We believe that this facility will allow users to 
check many of the constraints they have been asking to enforce, but have 
been unable to do in Schema 1.0.  It will be particularly appropriate when 
those constraints seem naturally associated with what we call a type. 

That said, rule-based languages like Schematron have shown value for 
expressing constraints that are much more ad hoc, or at least that involve 
markup that is scattered relatively widely in an instance document.  While 
I'm delighted to acknowledge that Schematron was very much a source of 
inspiration for the Schema 1.1 assertions work, at least in my opinion, 
the goal was not and is not to discourage use of Schematron or similar 
systems for expressing and enforcing the more complex constraints that 
they are good at.  I do think that there will be an interesting subset of 
today's piplines in which the need for the second step will go away.  I 
think that's a good thing.  Databinding tools, etc. will likely benefit 
from having all the constraints in one place.  In many other cases, 
particularly when business rules are complex, or relate to disparate parts 
of the instance tree, I think it will be appropriate to run a pipeline of, 
say, Schema 1.1 and Schematron.  No doubt there will be some constraints 
for which either step will do.  That's already the case today:  surely you 
face tradeoffs in deciding whether to express certain occurrence 
constraints in Schema 1.0 grammars vs. Schematron constraints.

I do think you are raising some interesting points.  I too am curious to 
see how people feel about what's proposed for Schema 1.1, and how people 
expect to use XSD and Schematron in combination once Schema 1.1 is 
available.  Still, I really don't think it's appropriate to suggest that 
Schema 1.1 is intended to replace Schematron, or to discourage use of 
Schematron or pipelines for the things they will continue to do better. 
The assertions in Schema 1.1 are designed to add a constraint facility 
that's perceived as meeting many of the needs we've heard expressed by 
users of Schema 1.0, not to completely replace rule-based systems like 
Schematron. 

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








"Costello, Roger L." <costello@mitre.org>
08/02/2007 04:18 PM
 
        To:     <xml-dev@lists.xml.org>
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        [xml-dev] A single, all-encompassing data 
validation language - good or bad for the marketplace?


Hi Folks,

The XML Schema working group is in the process of incorporating rules
(assertions) into the XML Schema language:
 
      "... one of the things we had to decide when putting 
       XPath-based assertions into Schema 1.1" [Noah Mendelsohn]

Thus, the XML Schema language will become both a grammar-based language
as well as a rule-based language.

Up till this date, grammar-based and rule-based languages have been
kept separate:

    Grammar-based Languages: XML Schema, Relax NG, DTD

    Rule-based Languages: Schematron, RuleML

What do you think about XML Schema working group incorporating
rule-based capabilities into the language?

Here are some potential advantages and disadvantages:

ADVANTAGES

1. Need only one language to express all data validation requirements.

2. Possible performance improvement (as compared to separate languages
with separate validations).

DISADVANTAGES

1. XML Schemas is already quite large and complex.  This will make it
larger and more complex.

2. Discourages the use of a pipeline of validations for implementing
data validation requirements.

3. Possible performance degradation since, for example, validation
can't be halted when grammar requirements fail.

4. Replacing one grammar language with another becomes prohibitive
(example: you may want to replace XML Schemas with Relax NG)

5. Discourages competition.  Today there is a competition among the
schema languages.  A single language that does everything may reduce
the competition.

QUESTIONS

1. Can you add to the above list?  What other advantages and
disadvantages are there?

2. Is grammar validation of a fundamentally different nature than rule
validation?

3. If so, is it reasonable to merge two fundamentally different things?

4. Is it in the best interest of the marketplace to have a single,
all-encompassing data validation language, or is it better to have
multiple data validation languages that work together?

/Roger 

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
References:
- A single, all-encompassing data validation language - good or bad for the marketplace?
  - From: "Costello, Roger L." <costello@mitre.org>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]