OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Another "Against the Grain" column on XML



Our dear friend Fabian Pascal (relational database author/consultant of some renown) has published the 3rd part of his rant against XML.  This one doesn't belabor anyone's lack of intelligence, but actually makes a few points that seem to be worth some consideration.
See http://searchdatabase.techtarget.com/tipsIndex/0,289482,sid13_tax284872,00.html
 
"[T]o the extent that XML is a physical format for data exchange between applications, why is readability relevant? If it is not, then it can actually be argued that text-based data exchange compromises performance for no advantage." 
 
Sounds like the ASN.1 / binary XML discussion! I think an XML advocate's response is that readability is important for auditing/debugging data exchanges; "self-describing" tags make it easier to write programs to produce and parse the messages because the format is not so rigidly specified; and it is not at all clear that real-world performance on today's hardware is significantly compromised by using a relatively verbose, text-based format.  Any others?
 
"Furthermore, whether XML proponents realize it or not, "what things are, how they are related and how to deal with them" is almost the definition of a data model (more specifically, of the three main components of a data model: data types, data structure/integrity, and data manipulation)."
 
Well, I pretty much agree ... and of course the new generation of PSVI-based specs takes this fairly seriously. I do think that XML needs to think quite a bit harder about its conception of "integrity". Clearly pointer-based linking or inclusion schemes such as external parsed entities or XLink/XInclude don't have any built-in notion of referential integrity, and I have seldom if ever seen this discussed. I have a hypothesis that XML implies a document-centric notion of integrity that is quite different from the relational model, but it's clearly something that we need to discuss in some detail.
 
'Originally, I understood [algebra] to mean a set of operations that are closed over some type. That is, every operation in X Algebra operates on zero or more values of type X and returns a value of type X...Over what is the XML Query Algebra closed? Nobody has ever given me an answer that makes sense (apart from the occasional, honest "I don't know").'
 
Uhh, anyone wanna try to refute that?
 
'[D]ue to their horrendous complexity and inflexibility, databases and DBMSs relying on the hierarchical model became obsolete  in the 80's, at least technologically. SQL DBMSs based ... on the simpler relational data model, based on predicate logic and set mathematics proved superior. ...What is the justification, then, for choosing a more complex, discredited data model for data exchange, when a majority of commonly used DBMSs employ a simpler, sounder and, thus, superior data model?'
 
Pascal thinks of this as the knockout punch against XML.  I think the XML advocate's response is: Codd demonstrated that the relational model is simpler to, and more powerful than, the network/hierarchical data model as a universal theory of data. Codd clearly won the "great debate" in the 1970s, but 25 years later the  skill of effectively designing properly normalized relational databases remains is a rare and valuable one, even though most of us learned the rudiments in school. Hierarchies may be unsuitable as a general, formal theory of data, but they are an easy and convenient way for humans to organize concepts (see Herbert Simon's "The Architecture of Complexity' essay) and for which effective metaphors abound (e.g., computer filesystems, formal organizations, etc.).  So, the relational model is a great formalism, but the hierarchical XML model is a convenient, intuitive, and "formal-enough" alternative.    What are some other reasons why the XML hierarchical data model is popular long after Codd discredited it as a theory of data?
 
Finally, 'Ironically, many of these technologies are created as "standards", purportedly to simplify and to maximize communication and integration, but the plethora of different, ad-hoc, often multiple standards achieves just the opposite.' 
 
Ouch, he's got us there!
 
Anyway, it's gotta be more interesting to think about this stuff than to debate ISO some more, eh?