[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] A single, all-encompassing data validation language - good orbad for the marketplace?
- From: noah_mendelsohn@us.ibm.com
- To: "Costello, Roger L." <costello@mitre.org>
- Date: Thu, 2 Aug 2007 17:05:54 -0400
Whoa. First a caveat, I'm speaking very much for myself here, not for the
Schema WG.
While I think the questions you're asking here are interesting, I think
you are somewhat mischaracterizing the goals of the assertions work
proposed for Schema 1.1. You seem to be implying that the goal is to
replace rule-based languages like Schematron, or to discourage the use of
pipelines. That's not how I see it. I think everyone involved agrees
that there are many important and useful "rules" that you either won't be
able to express in Schema 1.1, or at least that you won't be able to
express as conveniently and naturally as you can in, say, Schematron.
So what are the new assertions in W3C XML Schema (XSD) version 1.1 trying
to do? XSD is a type-based language. You can declare two elements to be
integers, and you then know that the constraints on their contents are the
same. Similarly you can declare complex types, which allow for element
and attribute children, and there too, any two elements with the same type
have the same content contstraints. So:
<element name="width" type="measurement"/>
<element name="height" type="measurement"/>
tells you that the content rules for <width> elements and <height>
elements are the same, regardless of where those elements appear. This
gives a very natural bindng to program structures and databases; in a
language like Java you can map the type "measurement" to a class, and each
instance of a <width> or <height> element maps to an instance of that
class. Those are then likely to be assignment compatible. If in your
Java code you want to say "tableWidth = roomWidth" it probably works.
XQuery leverages the fact that types have this clean,
context-indepedendent semantic.
In Schema 1.0, most of the constraints you could express for a complexType
were in the former of a grammar for the child element, and occurrence
rules for attributes. In schema 1.1, we add additionally the ability to
apply XPath predicates >to the subtree governed by the type<. That's
architecturally appropriate for Schema (IMO), but much more limited than
what Schematron allows. The XPath constraints are thus integrated with
the type system, and they preserve the architectural invariant that two
instances of the same type are governed by the same constraints. There's
(intentionally) no way to say "a meausrement must obey these rules, unless
the element in question has as its 3rd left sibling an element named
<tangerine>" . What you can do is require that all measurements be in a
certain range, that the integer value of a measurement be less than an
@maxval attribute, etc. Types remain context independent (I'll ignore XSD
identity constraints, which I've always thought were a mistake precisely
because they are not integrated with the type system. Anyway...)
Furthermore, because it's integrated with type-based validity checking,
assertion failures in Schema 1.1 are integrated with the reporting of
other validity failures. I believe this is important and useful,
particular in the cases where validity checking is invoked through an API
by some container application (as opposed to being invoked interactively
by a user reading error messages on a screen.)
All this represents a tradeoff. It does less than what Schematron does.
What it provides, is well integrated with the type and validity reporting
system of the language. We believe that this facility will allow users to
check many of the constraints they have been asking to enforce, but have
been unable to do in Schema 1.0. It will be particularly appropriate when
those constraints seem naturally associated with what we call a type.
That said, rule-based languages like Schematron have shown value for
expressing constraints that are much more ad hoc, or at least that involve
markup that is scattered relatively widely in an instance document. While
I'm delighted to acknowledge that Schematron was very much a source of
inspiration for the Schema 1.1 assertions work, at least in my opinion,
the goal was not and is not to discourage use of Schematron or similar
systems for expressing and enforcing the more complex constraints that
they are good at. I do think that there will be an interesting subset of
today's piplines in which the need for the second step will go away. I
think that's a good thing. Databinding tools, etc. will likely benefit
from having all the constraints in one place. In many other cases,
particularly when business rules are complex, or relate to disparate parts
of the instance tree, I think it will be appropriate to run a pipeline of,
say, Schema 1.1 and Schematron. No doubt there will be some constraints
for which either step will do. That's already the case today: surely you
face tradeoffs in deciding whether to express certain occurrence
constraints in Schema 1.0 grammars vs. Schematron constraints.
I do think you are raising some interesting points. I too am curious to
see how people feel about what's proposed for Schema 1.1, and how people
expect to use XSD and Schematron in combination once Schema 1.1 is
available. Still, I really don't think it's appropriate to suggest that
Schema 1.1 is intended to replace Schematron, or to discourage use of
Schematron or pipelines for the things they will continue to do better.
The assertions in Schema 1.1 are designed to add a constraint facility
that's perceived as meeting many of the needs we've heard expressed by
users of Schema 1.0, not to completely replace rule-based systems like
Schematron.
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
"Costello, Roger L." <costello@mitre.org>
08/02/2007 04:18 PM
To: <xml-dev@lists.xml.org>
cc: (bcc: Noah Mendelsohn/Cambridge/IBM)
Subject: [xml-dev] A single, all-encompassing data
validation language - good or bad for the marketplace?
Hi Folks,
The XML Schema working group is in the process of incorporating rules
(assertions) into the XML Schema language:
"... one of the things we had to decide when putting
XPath-based assertions into Schema 1.1" [Noah Mendelsohn]
Thus, the XML Schema language will become both a grammar-based language
as well as a rule-based language.
Up till this date, grammar-based and rule-based languages have been
kept separate:
Grammar-based Languages: XML Schema, Relax NG, DTD
Rule-based Languages: Schematron, RuleML
What do you think about XML Schema working group incorporating
rule-based capabilities into the language?
Here are some potential advantages and disadvantages:
ADVANTAGES
1. Need only one language to express all data validation requirements.
2. Possible performance improvement (as compared to separate languages
with separate validations).
DISADVANTAGES
1. XML Schemas is already quite large and complex. This will make it
larger and more complex.
2. Discourages the use of a pipeline of validations for implementing
data validation requirements.
3. Possible performance degradation since, for example, validation
can't be halted when grammar requirements fail.
4. Replacing one grammar language with another becomes prohibitive
(example: you may want to replace XML Schemas with Relax NG)
5. Discourages competition. Today there is a competition among the
schema languages. A single language that does everything may reduce
the competition.
QUESTIONS
1. Can you add to the above list? What other advantages and
disadvantages are there?
2. Is grammar validation of a fundamentally different nature than rule
validation?
3. If so, is it reasonable to merge two fundamentally different things?
4. Is it in the best interest of the marketplace to have a single,
all-encompassing data validation language, or is it better to have
multiple data validation languages that work together?
/Roger
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.
[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]