Just to share my 2 cents. I think the more rigorous way to look at complexity is based on the Information Entropy theory. We can think of designing a language or schema like designing an encoding program. The primitives (and combinations) are like the chosen alphabet of the encoding program. Then the real world problems have to be expressed by the language/schema, just like the real world data has to be encoded by the encoder. The best/simplest language/schema would give us shortest total sum of all the result programs, just like the best encoder will give the smallest total size of the encoded world. From the encoding theory, we know very well that smallest alphabet isn't always the best strategy. When the alphabet is small, the result program is likely to be longer. DNA is just (A T G C). And all IT systems uses just (0 and 1), can't be simpler. But we all know bio systems and IT systems are not simple just because they have a simple alphabet. A too big alphabet is also no good. It's just like all those bloat IT systems with tons of APIs. The ideal case is to find the ideal alphabet size. That is not too small or too big. I think Roger's emphasis is actually on combination rules, especially orthogonal rules. Unlike simple primitives (which are linear), they give us exponential expressing power. The problem is that it is hard to discover such rules in the seemly complex real world. Only genius like Newton is able to explain the real world in only 3 laws and their combinations. Regards Henry On 18/12/2011 4:16 AM, David Lee wrote: 005301ccbcf8$b74c6450$25e52cf0$@calldei.com" type="cite">Interesting argument. IMHO I disagree. By limiting the number of terms you dont solve the complexity problem, you simply put it off to another scale. Are LISP programs intrinsically more simple and provable then C++ programs ? DNA is composed of 4 distinct 'elements (A T G C) yet comprises one of the most complex and unprovable systems imaginable. Human languages are typically composed of 2 handfuls of terms (letters) yet comprise a wealth of complexity and improability. But its not the number of base terms. Are the concepts expressed in Chinese more complicated than English ? And don't forget binary computers ... there are really only 1 symbols. There are definite advantages to having less terms but limiting complexity and increasing provability isn't one of them. ---------------------------------------- David A. Lee dlee@calldei.com http://www.xmlsh.org -----Original Message----- From: Costello, Roger L. [mailto:costello@mitre.org] Sent: Saturday, December 17, 2011 2:50 PM To: xml-dev@lists.xml.org Subject: [xml-dev] Most XML vocabularies are too large and inevitably have lots of "holes" Hi Folks, Recently I have been learning Lambda Calculus [1]. A fascinating thing about Lambda Calculus is its richness, despite it being extraordinarily simple. The set of expressions (lambda-terms) that can be created in Lambda Calculus is defined as follows: a. All variables are lambda-terms b. If M and N are any lambda-terms, the (M N) is a lambda-term (called an application) c. If M is any lambda-term and x is any variable, then (\x -> M) is a lambda-term (called an abstraction) Wow! With just a few items and a few combination rules, an entire field was spawned. Because it is limited it has been possible to formally characterize Lambda Calculus. A few days ago Michael Kay made this startling statement regarding XML Schema ... the more you read the XSD spec, the more holes you find. And on the xmlschema-dev list Michael Kay recently stated this ... the schema construction model is not defined very formally ... Let's think about this: 1. XML Schema is a comparatively small XML vocabulary. I haven't counted the number of elements and attributes but let me guess that the total is 100 (probably less). 2. XML Schema is pretty rigorously specified. Yet despite its smallness and fairly rigorous specification it still has "holes" in it. ASSERTION: An XML vocabulary consisting of 100 items (or more) is too much. It can never be formally specified and it will forever have "holes." Let's do a little math. Suppose an XML vocabulary consists of 5 elements -- A, B, C, D, E -- and one of them must be the root element which must contain only one child element. Here are some valid instances <A> <B>___</B> </A> <A> <C>___</C> </A> <B> <A>___</A> </B> And so forth. With this extremely constrained XML vocabulary there are: 5 * 4 = 20 permutations (XML instances with differing arrangements of markup). If we allow the root element to have one or two child elements then there are: 5 * 4 + 5 * 2**4 = 100 permutations. The complexity grows at an breathtaking rate as the size of the vocabulary increases and as the ways of combining the vocabulary increases. How will you possibly avoid "holes" in an XML vocabulary that has a complexity space that is in the trillions of trillions of trillions of permutations? You can't. ASSERTION: Large XML vocabularies must be avoided. So, what's the solution? The solution is to do what Lambda Calculus has done and what Simon Peyton-Jones has described in his article "How to write a financial contract". That is, create a small set of simple, well-specified primitives and a few combination rules. So, how many primitives and how many combination rules? Let me toss out a number: an XML vocabulary should not contain more than a dozen primitive elements and a handful of combination rules. That should be enough to generate all the richness one could possibly ever need. And you just might be able to formally specify your XML vocabulary and ensure that it has no "holes." Clearly this is the only way to go for mission-critical applications. Comments? /Roger [1] This is a fabulous book on Lambda Calculus (but be prepared to study it, not just read it): http://www.amazon.com/Lambda-Calculus-Combinators-Introduction-Roger-Hindley /dp/0521898854/ref=sr_1_1?ie=UTF8&qid=1324146163&sr=8-1 _______________________________________________________________________ XML-DEV is a publicly archived, unmoderated list hosted by OASIS to support XML implementation and development. To minimize spam in the archives, you must subscribe before posting. [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ Or unsubscribe: xml-dev-unsubscribe@lists.xml.org subscribe: xml-dev-subscribe@lists.xml.org List archive: http://lists.xml.org/archives/xml-dev/ List Guidelines: http://www.oasis-open.org/maillists/guidelines.php _______________________________________________________________________ XML-DEV is a publicly archived, unmoderated list hosted by OASIS to support XML implementation and development. To minimize spam in the archives, you must subscribe before posting. [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ Or unsubscribe: xml-dev-unsubscribe@lists.xml.org subscribe: xml-dev-subscribe@lists.xml.org List archive: http://lists.xml.org/archives/xml-dev/ List Guidelines: http://www.oasis-open.org/maillists/guidelines.php |