OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   XHTML and its discontents -- was Re: [xml-dev] What is the rule for par

[ Lists Home | Date Index | Thread Index ]

On Jul 13, 2004, at 3:05 PM, Dare Obasanjo wrote:

> . Bottom line, the only
> justification the XHTML folks like ERH can give for Google, Amazon or 
> my
> weblog going to XHTML from HTML is that they can use XSLT, XQuery and
> XOM to screen scrape the website.

Interesting perspective ... very different from where I was coming 
from.  I think of Amazon, Google, and your weblog more as knowledge 
bases to be queried than as pages to be screenscraped.  XPath/XQuery 
are ways to find the information, XSLT or [pick your favorite tree API] 
are ways to reuse that information for one's  own purposes.  I don't 
want to argue that my perspective is right and "screenscraping" is 
wrong, just that they imply different things about the value of XHTML.

>  Of course, in all 3 cases the website
> owner has given alternate XML sources for data the user would like to
> screen scrape (Google Web Services, Amazon Web Services, my RSS feed). 
> I
> personally think using HTML for presenation and specific XML sources 
> for
> extracting content makes a lot more sense than trying to munge it all
> into one interface via XHTML.
> Joshua is right, XHTML was a stopgap. Many have confused the means to 
> an
> end to the end itself.
Well, it's certainly not a great format -- sortof a weird intermingling 
of display-oriented markup and general document semantics.  But if the 
"end" is information that is both machine processable and 
human-readable, there are enough ways to use conventions -- such as the 
"class" attribute values often used by CSS stylesheets -- to put 
semantic content in there while making it look good.    I basically 
agree with Jon Udell http://www.xml.com/pub/a/2003/12/23/udell.html -- 
"since we are going to keep on using HTML, it behooves us to find 
smarter and better ways to use it. XHTML is one of those smarter and 
better ways. CSS is another. It strikes me as a really interesting 
opportunity to smuggle metadata into documents. People who don't know 
or care about metadata will nevertheless spend a lot of time fiddling 
with styles because they care a lot about how their documents look."

OK, XHTML is a stopgap, but the gap needs stopping --  alternatives 
such as XML+XSLT aren't universally supported (e.g. Mark Pilgrim's Atom 
feed looks terrible in Opera and Safari).  Likewise, XML+XSLT has the 
problem that if you have M XML source formats and N output formats 
(e.g. for different browsers, different devices, different classes of 
users ...) then you have M x N stylesheets to maintain.  With XHTML as 
the single source format, you have N stylesheets to maintain. Sure, 
that has the cost of the kludges you must do to force-fit information 
into XHTML, but it's not at all clear to me that the cost of this 
outweighs the practical benefit.  And of course you can use some better 
XML format as the single source format, but then you have to design it, 
deploy it, change it, manage the versions ....and deal with the snotty 
users who don't believe you've added enough value over (X)HTML to 
justify all these costs.

I'm no fan of XHTML or the W3C process that has produced it, but I'm 
beginning to think that it's like democracy -- the worst of all 
possible approaches, except for the known alternatives.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS