[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
RE: [xml-dev] A single, all-encompassing data validation language - good orbad for the marketplace?
- From: noah_mendelsohn@us.ibm.com
- To: Michael Kay <mike@saxonica.com>
- Date: Mon, 13 Aug 2007 16:27:09 -0400
Michael Kay writes:
> Though personally I have no problems with allowing Turing-
> completeness in the validation language.
Actually, I agree with almost everything else you've written in this
thread, but with this particular bit I'm somewhat disagree. In
particular, I'd point to the TAG's finding on the Rule of Least Power [1].
You'll see may name on it, because I helped some with editing, but it's
fundamentally drawn from one of TimBL's design documents [2].
Either way, the point is that Turing-complete languages are very powerful,
but when you have an arbitrary program in a Turing-complete language it
can be quite hard to reason about what that program does. For all its
surface complexity, most of what W3C XSD does boils down to validation
that can be expressed in a deterministic automaton (in fact, you say
that's how you built it.) The fact that the language is less powerful
than a Turing complete language makes it much easier to, for example, take
in an arbitrary schema and automatically produce data bindings for
instances accept by the language, etc. In the case of RelaxNG, it allows
one to statically compute schemas for intersection, union and difference
of any two schemas. On the other hand, neither language can conveniently
express a rule that the occurrence count for some element must be an
(arbitrary) prime number. Indeed, one cannot using the facet mechanisms
of simple types require that the content of some element or attribute >be<
a prime number. All of that would be straightforward if the languages
were Turing-complete.
There are, of course, lots of deployment problems relating to
Turing-complete languages. The fact that halting can't be proved [3]
means that any validator running an arbitrary schema in a Turing-complete
language must be prepared for that schema to run more or less forever
without reporting validity one way or the other.
So, I think it's quite valuable that most of our popular schema languages
are not Turing-complete. Then again, I would agree that in cases where
the power of a Turing-complete language is truly needed, it would be
better to have an interoperable one than a platform-specific one. [2]
doesn't say that Turing-complete languages are bad; it suggests that you
save them for the times when you really need the power. For now, I'm
mostly OK with the 80/20 point that more declarative, non-Turing languages
give us.
Noah
[1] http://www.w3.org/2001/tag/doc/leastPower.html
[2] http://www.w3.org/DesignIssues/Principles.html#PLP
[3] http://en.wikipedia.org/wiki/Halting_problem
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
Michael Kay <mike@saxonica.com>
08/07/2007 11:00 PM
To: "'Amelia A Lewis'" <amyzing@talsever.com>,
xml-dev@lists.xml.org
cc: (bcc: Noah Mendelsohn/Cambridge/IBM)
Subject: RE: [xml-dev] A single, all-encompassing data
validation language - good or bad for the marketplace?
> > I've no idea what the theoretical limits of what you can do with a
> > grammar are (I don't even know what the accepted definition of
> > "grammar" is), but I
>
> Err. Murata-san has a very nice paper that characterizes the
> sort of grammar embodied by several available grammars for
> XML, including W3C XML Schema
Thanks for the essay. I'm well aware that there's a vast theoretical
literature on different classes of grammar and their power. It doesn't
alter
my point one bit - I'm not personally interested in exploring the limits
of
the technology. There might well be a class of grammar in which I can
express the rule that February has 28 days except in a leap year, or that
the sales-to-date for a given month must be greater than the sales in the
previous month, but I can't see why I would want to express such
constraints
grammatically when it's so much easier to express them in other ways.
>It's very likely that most W3C XML Schema parsers are implemented in this
fashion, rather than as
>deterministic (or equivalently, nondeterministic) finite automata. If
that
*is* the case, then it seems >(to me) rather sensible to make use of the
power of that style of automata more fully, rather than
>providing a sort of "escape" mechanism to break out of the automaton
(which
effectively puts you into
>turing machine territory, with an implementation problem of unknown
complexity).
Well, my implementation is a deterministic FSA, I can't speak for others.
But again that's not the point. The way users should express constraints
should be in the way that's most convenient and expressive for the users,
it
has nothing to do with the capabilities of a particular piece of internal
technology. And predicate logic is hardly unknown territory.
XPath, incidentally, is relationally complete but not Turing complete as
far
as I know: unless you allow user-defined recursive functions you can't
express constraints like non-cyclicity of a graph. Though personally I
have
no problems with allowing Turing-completeness in the validation language.
Users sometimes have to check that a parts explosion contains no cycles,
and
they will find a language in which they can program such checks, however
much we try to stop them.
Michael Kay
http://www.saxonica.com/
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.
[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]