Lists Home |
Date Index |
Mike Champion wrote :
> 1/13/2002 10:04:39 PM, Paul T <email@example.com> wrote:
>>I hate to say it, but I think that all that markup stuff
>>is actually about placing '\' and ',' symbols on steroids in
>>one way or another. Why can't people agree that
>>any 'markup' language is :
>>0. Everything is (unicode) text.
>>1. Text can have 'groups' , separated by 'separators'
>>( the less, the better, but hard to tell in advance ;-)
>>2. There should be some way to escape separators
>>( \ works just fine, from my point of view ;-)
>>Isn't it all we need to know about the 'markup
>Well, your scheme (and CSV) require some sort of schema to
>define what the fields are. XML has the tags as
>separators,allowing the data to be "self describing" at the
>cost of only easily-compressible vebosity. Also, your scheme
>would make flexible hierarchical data difficult to exchange.
>If I've missed your point and "separators" can be self-
>describing, I think you've described "minimal XML"
I don't think tag names are self-describing separators. Granted, if you are
lucky, for some simple schema, they are self-describing, for a human being.
But then, why are there documentation for HTML, XSL:T, XSL:FO, SOAP, etc. ?
It is because tag names alone are not descriptive enough, and because
structure matters, not tag names only.
Any given XML document requires a schema, and not only for validation.
Except for all the XML tools that work at the lexical level (such as XML
editors, or XSL:T engines - but not XSL:T stylesheets), an XML application
has to rely on an implicit or explicit schema to process XML documents
meaningfully, i.e. at the semantic level, because it is the schema that
creates the document semantics.
Well-formedness alone is a lure : your XML documents may not have a schema,
but due to the nature of current programming language, processing a document
always ends up by following a particular algorithm depending on a subset of
their implicit or explicit schema. If you don't write the schema
explicitely, its ghost will appear in your programs anyway, created by the
assumption the program has to make to run properly.
So, given the fact that XML does not solve the fact that a schema is
required, specifying a schema based on separator names (the XML case) or a
positional notation (the $1, $2, $3 corresponding to the match of a
(.*);(.*);(.*) regexp which stands for a particular schema is the CSV case),
The biggest false promise of XML, I think, is that the fact that tag name
were apparent would solve the meta-data problem. XML does not provide more
solution to the meta-data problem than CSV or any given flat file format. It
may be more powerful than CSV, due to its ability to represent labeled
trees, but then again myriads of other formats can do the same, YAML being
The good idea in XML, though, is its readability. Because a part of a
document schema is expressed in its tag names, we can clearly see which
piece of data correspond to which "slot" in the schema, thus giving us great
insight in the semantic of the document. Tag names are great for the ease of
programming and debugging, but then again, they are absolutely of no use for
So, sorry for the provocation, but to me, the fact that tag names are
appearing is only syntactic sugar :).
Today, most programmers use their knowledge of the meta-data (the document
schema) to implement programs that process document following those
meta-data. It does not matter whether the meta-data is explicit or implicit
; given the way that current programming languages work, you have no choice
but to follow a schema in your implementation of the process.
Only by "going meta", thinking one level above, would it be possible to
process data independently of its schema. Instead of producing different
results depending on the input document, programs would produce different
results based on the input documents (instances of their schema) and their
schema (instances of their meta-schema).
Only at this level can a program process data independently of its schema,
with no schema structure embedded into the algorithms. This would guarantee
the extensibility and flexibility promised by XML (you remember that the X
stands for eXtensible, don't you ?). Yet, the schemas themselves have to
follow a particular meta-schema, with behaviours associated to particular
instances of the meta-schema, and the game continues...
I see only a way out of this "meta-data" madness : embedding behaviours into
meta-data. This way, a schema can be extended but still be processed by
programs built for the legacy schema. Sounds like object-oriented
programming, eh ?
I remember people in the old days, their eyes shining when they told us
about XML being a portable data format, and Java bytecode a portable code
format, and all the wonders a synergy between the two could create...
For now, we haven't really integrated XML and any programming language
together, in the sense that we can't send behaviours as meta-data. We can't
embed code in schema. Well, in fact, we could, but to begin with, we have to
agree on a standard schema format, a meta-meta-data problem... Then we would
have to agree on the kind of code that could be embedded in the schema. It
would have to be of only one kind (no multiple language allowed),
lightweight, portable, dedicated to XML manipulation... Uh-oh. I think I've
stepped on many peoples' feet...