[
Lists Home |
Date Index |
Thread Index
]
At 11:32 AM -0500 1/15/04, Michael Champion wrote:
>So, while I acknowledge the facts on the ground, it doesn't seem to
>be asking too much to have aggregators pass Atom directly to an XML
>parser, and continue to perform all sorts of cleanup on the RSS
>before passing it to an XML parser. (Maybe that's not how most
>aggregators work, but it's what Dare and Joe English described as
>their basic architecture). I guess there are two parts to my
>proposed solution: Educate people that Atom is XML, and if you want
>to play the Atom game you really really ought to play by XML rules;
>accept that this is somewhat unrealistic, but make cleanups
>explicit, preferably on request rather than by magic, and mark the
>result as fixed up in case anyone downstream cares.
I don't believe that requiring well-formed XML is unrealistic in the
least. Why do people find it unreasonable to produce well-formed XML?
Are authors really hand-authoring RSS? I'm certainly not, and I
suspect the thousands using various blogging tools aren't either.
The only possible problem I see is if the RSS/Atom is produced by
screen scraping hand-authored HTML. But this is only a problem if the
tools that do the screen scraping assume the HTML is well-formed and
basically just copy and paste it, which is of course insane, broken,
and brain-damaged. Are the authoring tools really that stupid? (I
honestly don't know. I've only used the tools I've written.)
It is only sensible that a screen scraper should fix HTML using
something like Tidy before including it in an XML document. This is
perfectly OK. Data from non-XML source such as HTML documents, Word
files, and SQL databases is included in XML documents all the time;
and this is the appropriate time to make any fixes that are necessary
to create well-formedness. However, once a document has been labelled
as XML, it is no longer acceptable for downstream processes to make
such fixes in it. It's just too hard to figure out what's missing,
and correctly repair it, The proper response is to drop the document,
and perhaps kick back an error to its publisher.
--
Elliotte Rusty Harold
elharo@metalab.unc.edu
Effective XML (Addison-Wesley, 2003)
http://www.cafeconleche.org/books/effectivexml
http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA
|