OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] Stupid Question (was RE: [xml-dev] XML doesn't deserve its

[ Lists Home | Date Index | Thread Index ]

I think the main problems are :

1) The fact that adding type information creates a data model much more
complicated than the Infoset. People have to agree about types, about  their
names and their semantics.

Your example is OK provided everybody knows what is an "Int". I think XML
Schema so-called "simple" types are a precious thing here. Why not use the
xsi:type notation, while we're at it ?

For complex types, this is way more complex. You need a structure
definition, and you won't be able to embed it inline. You can still use
xsi:type, but you'll need a schema document to refer to.

The biggest problem is that people have still to understand that the
"syntax" part of XML is minute compared to the "tree, infoset, typed
infoset" part. A syntax for labeled trees is nothing. As soon as you start
to care about manipulating those labeled trees efficiently, a lot of other
concepts enter the dance. So type information, be it inline or outline, will
require many people to change their mind about the "T" world (remember this
old thread ?).

2) Obviously, there is a verbosity and readability problem. This of the
document-oriented XML case. Typing can still be useful, but how are you
going to convince people that already scorn typing tags that they'll have to
type tags AND type information within those tags ? 

3) What about tags like <adress my:type="address">...</address> ? I'll
answer your stupid question by an even more stupid one : what is the
difference between a tag name and the type of its element ? I guess tag
names qualify the role of the relation between a tag and its direct
ancestor, while types are associated to schema or pattern that put a certain
amount of constraints on the content of the element. But is that sure ?

4) More generally, to reply to your question and my own stupid question, I
think we should not think immediatly of a way to represent type names.
Before, we should think about what are types FOR. For a fresh start, let's
forget about validation and focus on extensibility.

My intuition is, we need some kind of meta-data to process XML data in an
extensible way. What would help me writing programs that can gracefuly
handle new document structure, or at least programs that can easily be
extended to support those new document structures ?

To begin with, knowing that I can fetch the same set of labeled data from
any given element in the old and new document structure is a plus. This
means that when I write some code which processes some data, I want this
data to respect a kind of contract which states that whatever the structure
of the data, I can obtain a particular view with a fixed labeled tree
structure. This contract allows me to write some procedural code, because in
procedural or OOP language, there is a strong binding between code and the
data it manipulates. New code can use the new document structure, because
its contract is extended to a new view with a different labeled tree

This "contract" between the code and the data is a precious piece of
meta-data to use when writing program that can handle extensibility.

Architectural Forms were built with this "contract" idea, AFAIK, as an
architecture can be used to build a "view" over elements. Note that AFs are
expressed by special attributes, so it's a bit like "inline type
information", except that DTD default attributes can be used to keep the
document less verbose. The question remains whether AFs can provide all the
flexibility we need to support any kind of schema evolution. The situation
in which some legacy code needs <firstname> first, then <lastname>, and a
new document in which those elements are in reverse order cannot be handled
by AFs, for example.

Anyway, OOP provides interesting concept regarding "contracts", "extension
of contracts" etc. Those are public interface of classes (or pure
interfaces) and inheritance. A schema language like XML Schema also support
those concepts, yet it is very difficult nowadays to access to schema
information from a random XML document (or even from an XML document with an
associated schema, because PSVI is not widely supported for now).

Then, I want to know, when encoutering a new document structure, if it is
safe for me to process the document. Maybe the
<nuclear-plant:stop-cooling-core/> tag has now a just-kidding="true"
attribute in the new document structure. In this case, my old program should
not be able to process those documents, even if their structure is seemingly
compatible : just ignoring the "just-kidding" attribute would have some dire

This means that independently of a definition of structure, I need some
meta-data that gives us versioning information, compatibility guidelines, or
general semantics information.

The need for general semantics can be illustrated by an XML document used to
send command to toy trains. Even if the document structure was exactly the
same for real trains (if those could receive XML documents as command, and
pigs could fly while we're at it), I wouldn't want my toy train commands to
be interpreted by real trains. Tag names or structure are of no help there :
I need something to be able to distinguish a document which really means
what I expect it to mean. "document type" is the first name that comes to my
mind for this...

I really think that we should think about how types can be used to help use
achieve true extensibility. Types are not only for validation, they are a
true help for extensibility. People seem reluctant to systematically append
a type system including OOP concepts like interfaces ans inheritance over
XML. The problem is that without this, and without extensibility guidelines,
XML does effectively not deserve its 'X'.


>-----Message d'origine-----
>De : Mike Champion [mailto:mc@xegesis.org]
>Envoyé : mardi 5 mars 2002 19:26
>À : xml-dev@lists.xml.org
>Objet : [xml-dev] Stupid Question (was RE: [xml-dev] XML 
>doesn't deserve
>its "X".)
>3/5/2002 12:35:49 PM, Nicolas LEHUEN <nicolas.lehuen@ubicco.com> wrote:
>>That's what I was suggesting. However, I don't see how this 
>can be achieved
>>without adding type information (AKA PSVI) to XML elements, and have a
>>typing system that supports extensibility. Looks like we're 
>reinventing OOP
>>there, with XML as a data serialisation format.
>This is in the spirit of "if we were doing this all over again ..." (or
>"if we were furry little creatures eating dinosaur eggs and 
>planning for
>the post-asteroid world ...), not a troll:  
>Why does XML carry around a label for every data value rather than 
>getting it from an out-of-band "schema" (a la EDI or
>ASN.1), but then use an out-of-band means to associate type 
>thus necessitating the PSVI?
>In a programming language, we say
>  class MyData {
>	Int    foo;
>	String bar;
>	Date   baz; }
>Serializing an instance to XML gives:
>	<myData>
>		<foo>0xffffffff</foo>
>		<bar>Someday/bar>
>		<baz>20371031</baz>
>      </myData>
>Why not just put the type information inline and
>make XML more "self-describing" (please don't
>shoot me ...) 
>	<myData>
>		<foo my:type="Int">0xffffffff</foo>
>		<bar my:type="String">Someday/bar>
>		<baz my:type="Date">20371031</baz>
>      </myData>
>or else just give it up and use
>ASN.1 for both the out-of-band 
>label and type information ?
>I'm sure this is a religious war I missed, somehow ...
>and like I said, it's a stupid question, please be
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://lists.xml.org/ob/adm.pl>


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS