Re: [xml-dev] Most XML vocabularies are too large and inevitablyhave lot

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] Most XML vocabularies are too large and inevitablyhave lots of "holes"

From: Henry Luo <henryluo@candlescript.org>
To: xml-dev@lists.xml.org
Date: Sun, 18 Dec 2011 14:37:48 +0800

Just to share my 2 cents.

I think the more rigorous way to look at complexity is based on the Information Entropy theory.

We can think of designing a language or schema like designing an encoding program.
The primitives (and combinations) are like the chosen alphabet of the encoding program.
Then the real world problems have to be expressed by the language/schema, just like the real world data has to be encoded by the encoder.
The best/simplest language/schema would give us shortest total sum of all the result programs, just like the best encoder will give the smallest total size of the encoded world.

From the encoding theory, we know very well that smallest alphabet isn't always the best strategy.
When the alphabet is small, the result program is likely to be longer.
DNA is just (A T G C). And all IT systems uses just (0 and 1), can't be simpler.
But we all know bio systems and IT systems are not simple just because they have a simple alphabet.

A too big alphabet is also no good. It's just like all those bloat IT systems with tons of APIs.

The ideal case is to find the ideal alphabet size. That is not too small or too big.

I think Roger's emphasis is actually on combination rules, especially orthogonal rules.
Unlike simple primitives (which are linear), they give us exponential expressing power.

The problem is that it is hard to discover such rules in the seemly complex real world.
Only genius like Newton is able to explain the real world in only 3 laws and their combinations.

Regards

Henry

On 18/12/2011 4:16 AM, David Lee wrote:

005301ccbcf8$b74c6450$25e52cf0$@calldei.com" type="cite">

Interesting argument.
IMHO I disagree.   By limiting the number of terms you dont solve the
complexity problem, you simply put it off to another scale.
Are LISP programs intrinsically more simple and provable then C++ programs ?

DNA is composed of 4 distinct 'elements (A T G C) yet comprises one of the
most complex and unprovable systems imaginable.
Human languages are typically composed of 2 handfuls of terms (letters) yet
comprise a wealth of complexity and improability.
But its not the number of base terms.
Are the concepts expressed in Chinese more complicated than English ?
And don't forget binary computers ... there are really only 1 symbols.

There are definite advantages to having less terms but limiting complexity
and increasing provability isn't  one of them.


----------------------------------------
David A. Lee
dlee@calldei.com
http://www.xmlsh.org

-----Original Message-----
From: Costello, Roger L. [mailto:costello@mitre.org] 
Sent: Saturday, December 17, 2011 2:50 PM
To: xml-dev@lists.xml.org
Subject: [xml-dev] Most XML vocabularies are too large and inevitably have
lots of "holes"


Hi Folks,

Recently I have been learning Lambda Calculus [1].

A fascinating thing about Lambda Calculus is its richness, despite it being
extraordinarily simple.

The set of expressions (lambda-terms) that can be created in Lambda Calculus
is defined as follows:

a. All variables are lambda-terms

b. If M and N are any lambda-terms, the (M N) is a lambda-term (called an
application)

c. If M is any lambda-term and x is any variable, then (\x -> M) is a
lambda-term (called an abstraction) 

Wow!

With just a few items and a few combination rules, an entire field was
spawned.

Because it is limited it has been possible to formally characterize Lambda
Calculus.

A few days ago Michael Kay made this startling statement regarding XML
Schema

      ... the more you read the XSD spec, the more holes you find.

And on the xmlschema-dev list Michael Kay recently stated this

      ... the schema construction model is not defined very formally ...

Let's think about this:

1. XML Schema is a comparatively small XML vocabulary. I haven't counted the
number of elements and attributes but let me guess that the total is 100
(probably less).

2. XML Schema is pretty rigorously specified.

Yet despite its smallness and fairly rigorous specification it still has
"holes" in it.

ASSERTION: An XML vocabulary consisting of 100 items (or more) is too much.
It can never be formally specified and it will forever have "holes."

Let's do a little math. Suppose an XML vocabulary consists of 5 elements --
A, B, C, D, E -- and one of them must be the root element which must contain
only one child element. Here are some valid instances

<A>
    <B>___</B>
</A> 

<A>
    <C>___</C>
</A>

<B>
    <A>___</A>
</B>

And so forth.

With this extremely constrained XML vocabulary there are: 5 * 4 = 20
permutations (XML instances with differing arrangements of markup).

If we allow the root element to have one or two child elements then there
are: 5 * 4  + 5 * 2**4 = 100 permutations.

The complexity grows at an breathtaking rate as the size of the vocabulary
increases and as the ways of combining the vocabulary increases.

How will you possibly avoid "holes" in an XML vocabulary that has a
complexity space that is in the trillions of trillions of trillions of
permutations?

You can't.

ASSERTION: Large XML vocabularies must be avoided.

So, what's the solution?

The solution is to do what Lambda Calculus has done and what Simon
Peyton-Jones has described in his article "How to write a financial
contract". That is, create a small set of simple, well-specified primitives
and a few combination rules.

So, how many primitives and how many combination rules?

Let me toss out a number: an XML vocabulary should not contain more than a
dozen primitive elements and a handful of combination rules. That should be
enough to generate all the richness one could possibly ever need. And you
just might be able to formally specify your XML vocabulary and ensure that
it has no "holes."  

Clearly this is the only way to go for mission-critical applications.

Comments?

/Roger

[1] This is a fabulous book on Lambda Calculus (but be prepared to study it,
not just read it):
http://www.amazon.com/Lambda-Calculus-Combinators-Introduction-Roger-Hindley
/dp/0521898854/ref=sr_1_1?ie=UTF8&qid=1324146163&sr=8-1

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php



_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

References:
- Most XML vocabularies are too large and inevitably have lots of"holes"
  - From: "Costello, Roger L." <costello@mitre.org>
- RE: [xml-dev] Most XML vocabularies are too large and inevitably have lots of "holes"
  - From: "David Lee" <dlee@calldei.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]