Lists Home |
Date Index |
At 10:41 AM -0700 7/15/04, Joshua Allen wrote:
>You have perfectly described where our disagreement is. People want to
>be able to write web pages which can be read in web browsers. That is
>the overwhelming majority use case.
You are operating under the common misconception that what the
document publisher expects or wants to happen with the data is in
fact how readers will use the data. But that's not how the world
actually works, it's never been how the world works, and it's never
going to be how the world works; before, during, or after the Web.
The publisher generates information in some format. They have no
control over or reasonable expectation of what readers will do with
this information. Some will import it into databases. Some will read
it for amusement. Some will feed to search engines. Some will search
it for hidden messages. Some will use it to learn English. There's no
telling. Readers have their own needs, and will use the content to
satisfy those needs, irrespective of what the publisher might have
intended them to do.
For instance, I use Amazon's web site to fill my iTunes database with
album covers. I doubt that's anything Amazon ever considered
somebody doing, but so what? The information's there so I use it in
the way that makes sense to me.
A lot of readers' needs are better served if the data is well-formed.
And it's not particularly hard to make it well-formed so why not do
it? Sometimes you can get away without well-formedness. I don't think
Amazon's main site is well-formed, but if it were life would be
easier for a lot of the tools that process amazon data. In fact,
Amazon could do less work by exposing the data as one site rather
than two. If they had made their site more accessible to screen
scraping, they would have sold more books and other things earlier
because more affiliates could have more easily accessed and massaged
their data. And a lot of other people would have gotten a lot of
other cool things done too that instead had to wait till they
published and maintained a separate XML interface.
Elliotte Rusty Harold
Effective XML (Addison-Wesley, 2003)