[
Lists Home |
Date Index |
Thread Index
]
Chiusano Joseph wrote:
> Could you please elaborate on why you believe that using XML for data
> interchange isn't always the best solution, and what better solutions
> there may be?
Simon St. Laurent summed it up pretty well in his response, so I won't
repeat what he said there.
What I might add is that although I've not seen much documentation of
the actual thought processes that the original designers of XML went
through, it looks to me from what they produced that they really just
intended to replace HTML for the Web. From what I've seen, they were saying:
"The current HTML Web is only useful for human browsing. Despite having
a global data network, we're mainly using it to communicate stuff that
only humans can really understand. It's stupid that every e-commerce Web
site out there has a catalogue available online, but it's still not
practical to write an application that merges the catalogues of multiple
suppliers together, choosing the cheapest when the same product is
available from more than one supplier or just listing multiple supplier
options, depending on the user's wishes. So our plan is to migrate away
from HTML to a new web where people use an XML structure for their site,
that does not change, and a seperate layer of CSS or XSL that they can
change to make cosmetic adjustments. This way, applications can view the
product pages directly in XML; hopefully the catalogue sites will
migrate towards a standard XML vocabulary for product pages, but even if
not, it won't be the end of the world if people have to write XSLT to
transform the different vocabularies into an arbitrarily chosen common
one within the application. A neat side effect of this is that we will
move people away from the current horrible abuse of HTML as a graphical
layout language rather than a document language, which is causing
accessability problems". It was the logical next step after moving
people away from font tags towards CSS; it was a logical progression from:
<div class="introduction">The <span class="emphasis">best</span> type of
cheese is ...
towards:
<introduction>The <emphasis>best</emphasis> type of cheese is ...
From what I've seen - and please correct me if I'm wrong - they just
wanted to try to move the Web towards something that machines can gleen
meaning from, in order to inrease the ease of 'screen-scraping' (for
application developers) and alternative ways of displaying the
information to humans (such as speech synthesis, for the disabled as
well as people who just plain want to control how they view
information). Although I'm not sure if some of what Tim Berners-Lee said
at some points was just somewhat idealistic wording of how technology
could reshape the world or part of an actual coherent plan.
XML seems well designed for this kind of thing. Mixed content, and the
fact you have the choice between attributes and element content, seem to
point towards this (attributes would normally be used for
meta-information, to keep metadata out of the actual content text).
DTDs, with their fabled lack of type restraints on element content, seem
perfect for describing this kind of vocabulary.
If all recipies were written in the same XML vocabulary for recipies,
then not only would site authors still be able to make their pages fit
in with the rest of their site using CSS or XSL, meaning that by being
constrained to the common format, they are not having their precious
creative options taken away from them. Yet at the same time, hardened
recipe-addicts could tell their browsers to display every recipe they
encounter using a single fixed stylesheet so their skilled eye can use
familiar visual cues to skip through the ingredients list. And a recipe
library manager application can just accept the URL of a recipe, ignore
any links to stylesheets, and just extract the data.
Likewise, an online bank might render your bank statement using a common
XML format for bank statements so that not only can you look at it, you
can drag the URL into your personal financial app and expect it to start
reconciling the statement against your transaction log.
But people who write articles in the technical press and big business
(not sure in which order), perhaps aided by the fact that a large
percentage of Web developers really learnt programming by writing Web
applications and hadn't seen much of the world beyond that domain so
were unaware of a broad existing body of experience in transporting data
between applications (so who thought that reading from a binary file was
harder than parsing text - because they had more experience with the
latter than the former), seemed to think that this meant XML could do
*anything*; so the idea of XML-based protocols, and file formats for
applications using XML, began to spread. They started to use XML for
stuff that was mainly intended to be communicated between applications,
rather than for publishing documents on the Web.
Rather than treating XML as a document format, they treated it as a data
format. They wrote code that serialised objects to it -
<classname><elementname>value</elementname><element2name>value</element2name></classname>
- which didn't product anything very human readable, since there wasn't
any CSS or XSLT to convert it for display, and the class and element
names only meant anything to programmers. If all you want is a format to
express that this particular instance of Person has a Name of "Alaric"
and a Gender of "Male", then there are plenty of perfectly good formats
already around for that kind of thing; there's little need to create a
new format for it with XML to pass mail merge records from your database
to the company that's sending out your invoices.
Now, there's a use for an XML format for personal contact details if
you're producing a Web site that publishes contact details; if you have
them in a common XML address book format, then somebody viewing that
address on the Web can just drag the URL or whatever into their word
processor to have it open up a letter template with the name and address
prefilled, or drag it into the address book to have it added there.
But when you're exporting a database of 10,000 people's addresses, all
those human-readable element names used in every single of those records
are a bit redundant. And with no CSS or XSLT stylesheet, generic display
software will only be able to display it as a tree, which won't make an
immense amount of sense to non-technical people unless they sit and
stare at it for a while. Not that it will fall into the hands of many
such people as it is run off of the database and emailled to the invoice
printing house, anyway. In those industries, CSV still reigns supreme as
the format of choice.
And the flamewars began... :-)
My own story began as a designer of file formats and network protocols;
a large part of my theoretical study in computer science has been in the
representation of information. I heard rumours that XML solved the
'babel problem'; there seemed to be claims that XML solved
interoperability problems in some new way. But when I looked into it,
all I found was a way of writing tree structures in text, different in
no significant way from things like S-expressions. When I questioned
those who were fervently claiming that XML was suddenly making possible
what was not possible before, they said things along the lines of "Look!
Now my CGIs can output XML as well as HTML, so other people's CGIs can
just fetch the XML to get at the raw data rather! Suddenly, other
people's apps can use my app as a building block!"
To which my reply was something along the lines of "Er... people have
been doing that kind of thing for years, actually...". Because the fans
I talked to were people who joined an Internet composed of things like
IRC, email, Web pages, and Usenet and saw it only as a way for humans to
communicate; whereas I, and most of the people I hung out with, had
first heard of the Internet as a protocol stack, and saw it as a way for
bits of software to communicate - including, but not limited to, passing
around information for display to humans.
The thing is, most of the killer apps of the Internet were
human-communication applications. Previously, if two companies wanted to
exchange data for whatever reason, they'd set something up with a modem
that dialled at a fixed time every day, or on demand, or lay a leased
line. As they still do; but most new communications are set up to work
over the Internet, usually using scheduled FTP uploads in my experience.
The arrival of the Internet has mainly just decreased the cost of those
communications links, although many businesses are worried about the
security implications of putting their data over the public network
(banks in particular...) and still continue to pay for leased lines,
frame relay, and so on. However, if people wanted to chat in
subject-based geography-independent 'rooms' where their age, fashion
tastes, regional accent, income level, ethnic background, and gender
were all hidden, they'd have had to either use special technology such
as ham radio or dialup bulletin boards, which for various reasons (both
technical and social) tended to be the province of nerds. The fact that
the Internet lowered the cost of data communication enough to make
things like IRC, email, Usenet, and so on accessible to consumers made a
lot of new things possible, so those previously-nearly-impossible things
grew like wildfire.
One good thing that has come out of the XML hype is that it got people
thinking about application intercommunication across the Internet -
pulling it into the limelight again. Up until then, the decision makers
in online businesses hadn't really considered the idea much, since
making Web sites was what everyone was doing. Although they could have
easily made their core data and services available to other application
developers quite easily using anything from ONC RPC (toolkits for which
come with all the open source UNIXes, for a start - still a bigger
deployed base than XML tools, probably!) to emailling CSV files, should
they have thought it would make them money, the thought of 'syndicating'
their content via XML appealed to many. However, in the long run, it
seems that few commercial web sites are doing this unless you pay them
for the privelege, since they don't make any more money from you viewing
their pages in XML than in HTML, and writing seperate XML and CSS/XSL is
more complex and costly than just writing HTML in the first place.
I've been involved in a few Web projects that had the backend logic
producing XML and an XSLT layer converting that to HTML for display, and
they never really had any benefits to show from it... whereas with a
site that directly generates HTML, the designers can write an HTML page
with little magic codes in where they want the product name, cost, and
so on to be placed when the page is generated dynamically; with an XSLT
site, they need to pass their desired HTML layouts to an XSLT guru who
can convert them into stylesheets, and then passes a template XML file
to the programmer instead of a template HTML file. What's worse is that
most Web site designs are in constant flux as the management demand
changes to try and bring in more successful sales - so when they say
that each product page should now have links to the past 5 product pages
you viewed so it's easier to switch between products to compare them,
suddenly your nice generic "Product description XML" being generated
from the backend database has to contain dynamic information generated
from the viewer's browsing history, so the XSLT can convert that into
the 'Recently Viewed Items' box. Which kind of spoils the idea of it
being a display-format-independent abstract description of the product;
because the sad fact is that companies want their web sites to be groups
of pages that they control the display of. They have no particular
desire to provide URLs for their *products*, that when viewed happen to
produce a nice product page; they want URLs to point to pages. Part of
the original semantic-web XML dream seemed to be that there would be a
URL for a product, and viewing that URL in a browser would produce a
nice pretty page about the product while software could access the same
URL to get a concise database entry about the product, whose format
would not change with every site redesign, so application developers
could count upon it.
Sadly, most real e-commerce sites will from time to time decide to split
each product's details over several pages, or not, meaning that the
cosmetic changes don't just affect how a page's information is
displayed, but what information is on what page. So providing a URL from
which a universal product description could be fetched in XML would mean
*extra* effort for them to *seperately* provide that information; not
just a matter of moving away from HTML to XML+CSS. And they just don't
see much of a business case for doing that unless people pay to access
their data directly.
One point to consider is that none of the XML vocabularies I've seen
contain fields for adverts to display to the viewer - and even if they
did, no recipe database application in its right mind would bother
actually storing the adverts that were supposed to be shown alongside
the recipe on the original site and displaying them - and many sites
make a nice bit of revenue out of showing adverts. Any version of the
Web that does not allow the authors of sites to enforce the display of
adverts seems to get a lukewarm reception!
See also: http://www.xml.com/pub/a/2001/06/13/threemyths.html
ABS
|