RE: [xml-dev] A single, all-encompassing data validation language

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] A single, all-encompassing data validation language - good orbad for the marketplace?
From: noah_mendelsohn@us.ibm.com
To: Michael Kay <mike@saxonica.com>
Date: Mon, 13 Aug 2007 16:27:09 -0400
Michael Kay writes:

> Though personally I have no problems with allowing Turing-
> completeness in the validation language.

Actually, I agree with almost everything else you've written in this 
thread, but with this particular bit I'm somewhat disagree.  In 
particular, I'd point to the TAG's finding on the Rule of Least Power [1]. 
 You'll see may name on it, because I helped some with editing, but it's 
fundamentally drawn from one of TimBL's design documents [2]. 

Either way, the point is that Turing-complete languages are very powerful, 
but when you have an arbitrary program in a Turing-complete language it 
can be quite hard to reason about what that program does.  For all its 
surface complexity, most of what W3C XSD does boils down to validation 
that can be expressed in a deterministic automaton (in fact, you say 
that's how you built it.)  The fact that the language is less powerful 
than a Turing complete language makes it much easier to, for example, take 
in an arbitrary schema and automatically produce data bindings for 
instances accept by the language, etc.  In the case of RelaxNG, it allows 
one to statically compute schemas for intersection, union and difference 
of any two schemas.  On the other hand, neither language can conveniently 
express a rule that the occurrence count for some element must be an 
(arbitrary) prime number.  Indeed, one cannot using the facet mechanisms 
of simple types require that the content of some element or attribute >be< 
a prime number.  All of that would be straightforward if the languages 
were Turing-complete.

There are, of course, lots of deployment problems relating to 
Turing-complete languages.  The fact that halting can't be proved [3] 
means that any validator running an arbitrary schema in a Turing-complete 
language must be prepared for that schema to run more or less forever 
without reporting validity one way or the other. 

So, I think it's quite valuable that most of our popular schema languages 
are not Turing-complete.  Then again, I would agree that in cases where 
the power of a Turing-complete language is truly needed, it would be 
better to have an interoperable one than a platform-specific one.  [2] 
doesn't say that Turing-complete languages are bad;  it suggests that you 
save them for the times when you really need the power.  For now, I'm 
mostly OK with the 80/20 point that more declarative, non-Turing languages 
give us.

Noah

[1] http://www.w3.org/2001/tag/doc/leastPower.html
[2] http://www.w3.org/DesignIssues/Principles.html#PLP
[3] http://en.wikipedia.org/wiki/Halting_problem


--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








Michael Kay <mike@saxonica.com>
08/07/2007 11:00 PM
 
        To:     "'Amelia A Lewis'" <amyzing@talsever.com>, 
xml-dev@lists.xml.org
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        RE: [xml-dev] A single, all-encompassing data 
validation language - good or bad for the marketplace?


> > I've no idea what the theoretical limits of what you can do with a 
> > grammar are (I don't even know what the accepted definition of 
> > "grammar" is), but I
> 
> Err.  Murata-san has a very nice paper that characterizes the 
> sort of grammar embodied by several available grammars for 
> XML, including W3C XML Schema 

Thanks for the essay. I'm well aware that there's a vast theoretical
literature on different classes of grammar and their power. It doesn't 
alter
my point one bit - I'm not personally interested in exploring the limits 
of
the technology. There might well be a class of grammar in which I can
express the rule that February has 28 days except in a leap year, or that
the sales-to-date for a given month must be greater than the sales in the
previous month, but I can't see why I would want to express such 
constraints
grammatically when it's so much easier to express them in other ways.

>It's very likely that most W3C XML Schema parsers are implemented in this
fashion, rather than as 
>deterministic (or equivalently, nondeterministic) finite automata.  If 
that
*is* the case, then it seems >(to me) rather sensible to make use of the
power of that style of automata more fully, rather than 
>providing a sort of "escape" mechanism to break out of the automaton 
(which
effectively puts you into 
>turing machine territory, with an implementation problem of unknown
complexity).

Well, my implementation is a deterministic FSA, I can't speak for others.
But again that's not the point. The way users should express constraints
should be in the way that's most convenient and expressive for the users, 
it
has nothing to do with the capabilities of a particular piece of internal
technology. And predicate logic is hardly unknown territory. 

XPath, incidentally, is relationally complete but not Turing complete as 
far
as I know: unless you allow user-defined recursive functions you can't
express constraints like non-cyclicity of a graph. Though personally I 
have
no problems with allowing Turing-completeness in the validation language.
Users sometimes have to check that a parts explosion contains no cycles, 
and
they will find a language in which they can program such checks, however
much we try to stop them.

Michael Kay
http://www.saxonica.com/


_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
References:
- RE: [xml-dev] A single, all-encompassing data validation language -good or bad for the marketplace?
  - From: Michael Kay <mike@saxonica.com>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]