[
Lists Home |
Date Index |
Thread Index
]
On Jul 13, 2004, at 3:05 PM, Dare Obasanjo wrote:
>
> . Bottom line, the only
> justification the XHTML folks like ERH can give for Google, Amazon or
> my
> weblog going to XHTML from HTML is that they can use XSLT, XQuery and
> XOM to screen scrape the website.
Interesting perspective ... very different from where I was coming
from. I think of Amazon, Google, and your weblog more as knowledge
bases to be queried than as pages to be screenscraped. XPath/XQuery
are ways to find the information, XSLT or [pick your favorite tree API]
are ways to reuse that information for one's own purposes. I don't
want to argue that my perspective is right and "screenscraping" is
wrong, just that they imply different things about the value of XHTML.
> Of course, in all 3 cases the website
> owner has given alternate XML sources for data the user would like to
> screen scrape (Google Web Services, Amazon Web Services, my RSS feed).
> I
> personally think using HTML for presenation and specific XML sources
> for
> extracting content makes a lot more sense than trying to munge it all
> into one interface via XHTML.
>
> Joshua is right, XHTML was a stopgap. Many have confused the means to
> an
> end to the end itself.
>
Well, it's certainly not a great format -- sortof a weird intermingling
of display-oriented markup and general document semantics. But if the
"end" is information that is both machine processable and
human-readable, there are enough ways to use conventions -- such as the
"class" attribute values often used by CSS stylesheets -- to put
semantic content in there while making it look good. I basically
agree with Jon Udell http://www.xml.com/pub/a/2003/12/23/udell.html --
"since we are going to keep on using HTML, it behooves us to find
smarter and better ways to use it. XHTML is one of those smarter and
better ways. CSS is another. It strikes me as a really interesting
opportunity to smuggle metadata into documents. People who don't know
or care about metadata will nevertheless spend a lot of time fiddling
with styles because they care a lot about how their documents look."
OK, XHTML is a stopgap, but the gap needs stopping -- alternatives
such as XML+XSLT aren't universally supported (e.g. Mark Pilgrim's Atom
feed looks terrible in Opera and Safari). Likewise, XML+XSLT has the
problem that if you have M XML source formats and N output formats
(e.g. for different browsers, different devices, different classes of
users ...) then you have M x N stylesheets to maintain. With XHTML as
the single source format, you have N stylesheets to maintain. Sure,
that has the cost of the kludges you must do to force-fit information
into XHTML, but it's not at all clear to me that the cost of this
outweighs the practical benefit. And of course you can use some better
XML format as the single source format, but then you have to design it,
deploy it, change it, manage the versions ....and deal with the snotty
users who don't believe you've added enough value over (X)HTML to
justify all these costs.
I'm no fan of XHTML or the W3C process that has produced it, but I'm
beginning to think that it's like democracy -- the worst of all
possible approaches, except for the known alternatives.
|