RE: [xml-dev] Most XML vocabularies are too large and inevitably have lo

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

RE: [xml-dev] Most XML vocabularies are too large and inevitably have lots of "holes"

From: "David Lee" <dlee@calldei.com>
To: "'Costello, Roger L.'" <costello@mitre.org>, <xml-dev@lists.xml.org>
Date: Sat, 17 Dec 2011 15:16:08 -0500

Interesting argument.
IMHO I disagree.   By limiting the number of terms you dont solve the
complexity problem, you simply put it off to another scale.
Are LISP programs intrinsically more simple and provable then C++ programs ?

DNA is composed of 4 distinct 'elements (A T G C) yet comprises one of the
most complex and unprovable systems imaginable.
Human languages are typically composed of 2 handfuls of terms (letters) yet
comprise a wealth of complexity and improability.
But its not the number of base terms.
Are the concepts expressed in Chinese more complicated than English ?
And don't forget binary computers ... there are really only 1 symbols.

There are definite advantages to having less terms but limiting complexity
and increasing provability isn't  one of them.


----------------------------------------
David A. Lee
dlee@calldei.com
http://www.xmlsh.org

-----Original Message-----
From: Costello, Roger L. [mailto:costello@mitre.org] 
Sent: Saturday, December 17, 2011 2:50 PM
To: xml-dev@lists.xml.org
Subject: [xml-dev] Most XML vocabularies are too large and inevitably have
lots of "holes"


Hi Folks,

Recently I have been learning Lambda Calculus [1].

A fascinating thing about Lambda Calculus is its richness, despite it being
extraordinarily simple.

The set of expressions (lambda-terms) that can be created in Lambda Calculus
is defined as follows:

a. All variables are lambda-terms

b. If M and N are any lambda-terms, the (M N) is a lambda-term (called an
application)

c. If M is any lambda-term and x is any variable, then (\x -> M) is a
lambda-term (called an abstraction) 

Wow!

With just a few items and a few combination rules, an entire field was
spawned.

Because it is limited it has been possible to formally characterize Lambda
Calculus.

A few days ago Michael Kay made this startling statement regarding XML
Schema

      ... the more you read the XSD spec, the more holes you find.

And on the xmlschema-dev list Michael Kay recently stated this

      ... the schema construction model is not defined very formally ...

Let's think about this:

1. XML Schema is a comparatively small XML vocabulary. I haven't counted the
number of elements and attributes but let me guess that the total is 100
(probably less).

2. XML Schema is pretty rigorously specified.

Yet despite its smallness and fairly rigorous specification it still has
"holes" in it.

ASSERTION: An XML vocabulary consisting of 100 items (or more) is too much.
It can never be formally specified and it will forever have "holes."

Let's do a little math. Suppose an XML vocabulary consists of 5 elements --
A, B, C, D, E -- and one of them must be the root element which must contain
only one child element. Here are some valid instances

<A>
    <B>___</B>
</A> 

<A>
    <C>___</C>
</A>

<B>
    <A>___</A>
</B>

And so forth.

With this extremely constrained XML vocabulary there are: 5 * 4 = 20
permutations (XML instances with differing arrangements of markup).

If we allow the root element to have one or two child elements then there
are: 5 * 4  + 5 * 2**4 = 100 permutations.

The complexity grows at an breathtaking rate as the size of the vocabulary
increases and as the ways of combining the vocabulary increases.

How will you possibly avoid "holes" in an XML vocabulary that has a
complexity space that is in the trillions of trillions of trillions of
permutations?

You can't.

ASSERTION: Large XML vocabularies must be avoided.

So, what's the solution?

The solution is to do what Lambda Calculus has done and what Simon
Peyton-Jones has described in his article "How to write a financial
contract". That is, create a small set of simple, well-specified primitives
and a few combination rules.

So, how many primitives and how many combination rules?

Let me toss out a number: an XML vocabulary should not contain more than a
dozen primitive elements and a handful of combination rules. That should be
enough to generate all the richness one could possibly ever need. And you
just might be able to formally specify your XML vocabulary and ensure that
it has no "holes."  

Clearly this is the only way to go for mission-critical applications.

Comments?

/Roger

[1] This is a fabulous book on Lambda Calculus (but be prepared to study it,
not just read it):
http://www.amazon.com/Lambda-Calculus-Combinators-Introduction-Roger-Hindley
/dp/0521898854/ref=sr_1_1?ie=UTF8&qid=1324146163&sr=8-1

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

Follow-Ups:
- Re: [xml-dev] Most XML vocabularies are too large and inevitablyhave lots of "holes"
  - From: Henry Luo <henryluo@candlescript.org>

References:
- Most XML vocabularies are too large and inevitably have lots of"holes"
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]