xml-dev - Re: [xml-dev] Relating to XML

Re: [xml-dev] Relating to XML
[ Lists Home | Date Index | Thread Index ]
To: Chiusano Joseph <chiusano_joseph@bah.com>
Subject: Re: [xml-dev] Relating to XML
From: Alaric B Snell <alaric@alaric-snell.com>
Date: Sat, 22 Nov 2003 23:34:36 +0000
Cc: Baiss Magnusson <cascades@earthlink.net>,XML-DEV <xml-dev@lists.xml.org>
In-reply-to: <3FBE7C53.45997E31@bah.com>
References: <A5D60E74-1C5A-11D8-B8A7-003065BF41DE@earthlink.net> <3FBE753A.3070406@alaric-snell.com> <3FBE7C53.45997E31@bah.com>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030704 Debian/1.4-1
Chiusano Joseph wrote:

> Could you please elaborate on why you believe that using XML for data
> interchange isn't always the best solution, and what better solutions
> there may be?

Simon St. Laurent summed it up pretty well in his response, so I won't 
repeat what he said there.

What I might add is that although I've not seen much documentation of 
the actual thought processes that the original designers of XML went 
through, it looks to me from what they produced that they really just 
intended to replace HTML for the Web. From what I've seen, they were saying:

"The current HTML Web is only useful for human browsing. Despite having 
a global data network, we're mainly using it to communicate stuff that 
only humans can really understand. It's stupid that every e-commerce Web 
site out there has a catalogue available online, but it's still not 
practical to write an application that merges the catalogues of multiple 
suppliers together, choosing the cheapest when the same product is 
available from more than one supplier or just listing multiple supplier 
options, depending on the user's wishes. So our plan is to migrate away 
from HTML to a new web where people use an XML structure for their site, 
that does not change, and a seperate layer of CSS or XSL that they can 
change to make cosmetic adjustments. This way, applications can view the 
product pages directly in XML; hopefully the catalogue sites will 
migrate towards a standard XML vocabulary for product pages, but even if 
not, it won't be the end of the world if people have to write XSLT to 
transform the different vocabularies into an arbitrarily chosen common 
one within the application. A neat side effect of this is that we will 
move people away from the current horrible abuse of HTML as a graphical 
layout language rather than a document language, which is causing 
accessability problems". It was the logical next step after moving 
people away from font tags towards CSS; it was a logical progression from:

<div class="introduction">The <span class="emphasis">best</span> type of 
cheese is ...

towards:

<introduction>The <emphasis>best</emphasis> type of cheese is ...

 From what I've seen - and please correct me if I'm wrong - they just 
wanted to try to move the Web towards something that machines can gleen 
meaning from, in order to inrease the ease of 'screen-scraping' (for 
application developers) and alternative ways of displaying the 
information to humans (such as speech synthesis, for the disabled as 
well as people who just plain want to control how they view 
information). Although I'm not sure if some of what Tim Berners-Lee said 
at some points was just somewhat idealistic wording of how technology 
could reshape the world or part of an actual coherent plan.

XML seems well designed for this kind of thing. Mixed content, and the 
fact you have the choice between attributes and element content, seem to 
point towards this (attributes would normally be used for 
meta-information, to keep metadata out of the actual content text). 
DTDs, with their fabled lack of type restraints on element content, seem 
perfect for describing this kind of vocabulary.

If all recipies were written in the same XML vocabulary for recipies, 
then not only would site authors still be able to make their pages fit 
in with the rest of their site using CSS or XSL, meaning that by being 
constrained to the common format, they are not having their precious 
creative options taken away from them. Yet at the same time, hardened 
recipe-addicts could tell their browsers to display every recipe they 
encounter using a single fixed stylesheet so their skilled eye can use 
familiar visual cues to skip through the ingredients list. And a recipe 
library manager application can just accept the URL of a recipe, ignore 
any links to stylesheets, and just extract the data.

Likewise, an online bank might render your bank statement using a common 
XML format for bank statements so that not only can you look at it, you 
can drag the URL into your personal financial app and expect it to start 
reconciling the statement against your transaction log.

But people who write articles in the technical press and big business 
(not sure in which order), perhaps aided by the fact that a large 
percentage of Web developers really learnt programming by writing Web 
applications and hadn't seen much of the world beyond that domain so 
were unaware of a broad existing body of experience in transporting data 
between applications (so who thought that reading from a binary file was 
harder than parsing text - because they had more experience with the 
latter than the former), seemed to think that this meant XML could do 
*anything*; so the idea of XML-based protocols, and file formats for 
applications using XML, began to spread. They started to use XML for 
stuff that was mainly intended to be communicated between applications, 
rather than for publishing documents on the Web.

Rather than treating XML as a document format, they treated it as a data 
format. They wrote code that serialised objects to it - 
<classname><elementname>value</elementname><element2name>value</element2name></classname> 
- which didn't product anything very human readable, since there wasn't 
any CSS or XSLT to convert it for display, and the class and element 
names only meant anything to programmers. If all you want is a format to 
express that this particular instance of Person has a Name of "Alaric" 
and a Gender of "Male", then there are plenty of perfectly good formats 
already around for that kind of thing; there's little need to create a 
new format for it with XML to pass mail merge records from your database 
to the company that's sending out your invoices.

Now, there's a use for an XML format for personal contact details if 
you're producing a Web site that publishes contact details; if you have 
them in a common XML address book format, then somebody viewing that 
address on the Web can just drag the URL or whatever into their word 
processor to have it open up a letter template with the name and address 
prefilled, or drag it into the address book to have it added there.

But when you're exporting a database of 10,000 people's addresses, all 
those human-readable element names used in every single of those records 
are a bit redundant. And with no CSS or XSLT stylesheet, generic display 
software will only be able to display it as a tree, which won't make an 
immense amount of sense to non-technical people unless they sit and 
stare at it for a while. Not that it will fall into the hands of many 
such people as it is run off of the database and emailled to the invoice 
printing house, anyway. In those industries, CSV still reigns supreme as 
the format of choice.

And the flamewars began... :-)

My own story began as a designer of file formats and network protocols; 
a large part of my theoretical study in computer science has been in the 
representation of information. I heard rumours that XML solved the 
'babel problem'; there seemed to be claims that XML solved 
interoperability problems in some new way. But when I looked into it, 
all I found was a way of writing tree structures in text, different in 
no significant way from things like S-expressions. When I questioned 
those who were fervently claiming that XML was suddenly making possible 
what was not possible before, they said things along the lines of "Look! 
Now my CGIs can output XML as well as HTML, so other people's CGIs can 
just fetch the XML to get at the raw data rather! Suddenly, other 
people's apps can use my app as a building block!"

To which my reply was something along the lines of "Er... people have 
been doing that kind of thing for years, actually...". Because the fans 
I talked to were people who joined an Internet composed of things like 
IRC, email, Web pages, and Usenet and saw it only as a way for humans to 
communicate; whereas I, and most of the people I hung out with, had 
first heard of the Internet as a protocol stack, and saw it as a way for 
bits of software to communicate - including, but not limited to, passing 
around information for display to humans.

The thing is, most of the killer apps of the Internet were 
human-communication applications. Previously, if two companies wanted to 
exchange data for whatever reason, they'd set something up with a modem 
that dialled at a fixed time every day, or on demand, or lay a leased 
line. As they still do; but most new communications are set up to work 
over the Internet, usually using scheduled FTP uploads in my experience. 
The arrival of the Internet has mainly just decreased the cost of those 
communications links, although many businesses are worried about the 
security implications of putting their data over the public network 
(banks in particular...) and still continue to pay for leased lines, 
frame relay, and so on. However, if people wanted to chat in 
subject-based geography-independent 'rooms' where their age, fashion 
tastes, regional accent, income level, ethnic background, and gender 
were all hidden, they'd have had to either use special technology such 
as ham radio or dialup bulletin boards, which for various reasons (both 
technical and social) tended to be the province of nerds. The fact that 
the Internet lowered the cost of data communication enough to make 
things like IRC, email, Usenet, and so on accessible to consumers made a 
lot of new things possible, so those previously-nearly-impossible things 
grew like wildfire.

One good thing that has come out of the XML hype is that it got people 
thinking about application intercommunication across the Internet - 
pulling it into the limelight again. Up until then, the decision makers 
in online businesses hadn't really considered the idea much, since 
making Web sites was what everyone was doing. Although they could have 
easily made their core data and services available to other application 
developers quite easily using anything from ONC RPC (toolkits for which 
come with all the open source UNIXes, for a start - still a bigger 
deployed base than XML tools, probably!) to emailling CSV files, should 
they have thought it would make them money, the thought of 'syndicating' 
their content via XML appealed to many. However, in the long run, it 
seems that few commercial web sites are doing this unless you pay them 
for the privelege, since they don't make any more money from you viewing 
their pages in XML than in HTML, and writing seperate XML and CSS/XSL is 
more complex and costly than just writing HTML in the first place.

I've been involved in a few Web projects that had the backend logic 
producing XML and an XSLT layer converting that to HTML for display, and 
they never really had any benefits to show from it... whereas with a 
site that directly generates HTML, the designers can write an HTML page 
with little magic codes in where they want the product name, cost, and 
so on to be placed when the page is generated dynamically; with an XSLT 
site, they need to pass their desired HTML layouts to an XSLT guru who 
can convert them into stylesheets, and then passes a template XML file 
to the programmer instead of a template HTML file. What's worse is that 
most Web site designs are in constant flux as the management demand 
changes to try and bring in more successful sales - so when they say 
that each product page should now have links to the past 5 product pages 
you viewed so it's easier to switch between products to compare them, 
suddenly your nice generic "Product description XML" being generated 
from the backend database has to contain dynamic information generated 
from the viewer's browsing history, so the XSLT can convert that into 
the 'Recently Viewed Items' box. Which kind of spoils the idea of it 
being a display-format-independent abstract description of the product; 
because the sad fact is that companies want their web sites to be groups 
of pages that they control the display of. They have no particular 
desire to provide URLs for their *products*, that when viewed happen to 
produce a nice product page; they want URLs to point to pages. Part of 
the original semantic-web XML dream seemed to be that there would be a 
URL for a product, and viewing that URL in a browser would produce a 
nice pretty page about the product while software could access the same 
URL to get a concise database entry about the product, whose format 
would not change with every site redesign, so application developers 
could count upon it.

Sadly, most real e-commerce sites will from time to time decide to split 
each product's details over several pages, or not, meaning that the 
cosmetic changes don't just affect how a page's information is 
displayed, but what information is on what page. So providing a URL from 
which a universal product description could be fetched in XML would mean 
*extra* effort for them to *seperately* provide that information; not 
just a matter of moving away from HTML to XML+CSS. And they just don't 
see much of a business case for doing that unless people pay to access 
their data directly.

One point to consider is that none of the XML vocabularies I've seen 
contain fields for adverts to display to the viewer - and even if they 
did, no recipe database application in its right mind would bother 
actually storing the adverts that were supposed to be shown alongside 
the recipe on the original site and displaying them - and many sites 
make a nice bit of revenue out of showing adverts. Any version of the 
Web that does not allow the authors of sites to enforce the display of 
adverts seems to get a lukewarm reception!

See also: http://www.xml.com/pub/a/2001/06/13/threemyths.html

ABS
Follow-Ups:
- Re: [xml-dev] Relating to XML
  - From: "J.Pietschmann" <j3322ptm@yahoo.de>
References:
- Relating to XML
  - From: Baiss Magnusson <cascades@earthlink.net>
- Re: [xml-dev] Relating to XML
  - From: Alaric B Snell <alaric@alaric-snell.com>
- Re: [xml-dev] Relating to XML
  - From: "Chiusano Joseph" <chiusano_joseph@bah.com>
Prev by Date: Re: [xml-dev] Microsoft FUD on binary XML...
Next by Date: Re: [xml-dev] Microsoft FUD on binary XML...
Previous by thread: Re: [xml-dev] Relating to XML
Next by thread: Re: [xml-dev] Relating to XML
Index(es):
- Date
- Thread