Lists Home |
Date Index |
Joshua Allen suggested that "new" items in RSS feeds could be
identified by doing:
> a diff, comparing file hashes, or whatever.
Well, that doesn't work very well with RSS as used today.
RSS feeds generated by http://quicktopic.com provide an
excellent example of why diff, hashes, etc. don't help when working
with RSS feeds. This example should clarify the urgent need for the
combination of unique entry id and date that Atom will provide.
QuickTopic RSS feeds are dynamically generated on demand.
Additionally, quicktopics modifies all hrefs from content in their
feeds so that they indirect through a "link.cgi" program. Presumably,
this allows them to track how frequently people follow links to other
sites. But, the real problem is that they add unique identifies to the
rewritten links. Those unique identifiers change for every version of
the file generated. Thus, any RSS item which is found in an RSS file
generated by QuickTopic will be different *every time* it is fetched
if it contains an external link.
At 5:54 this evening I fetched
The first item contains a link to an external site. It is:
At 5:56 this evening, I retched the same RSS file and the link
had changed to:
Note: The difference is in the "x=" parameter which is at the
end of the two hrefs. If you hash or diff these two entries, they will
be different even though the entry itself is over 7 months old!
If this was an atom feed, and if Quicktopics was "following
the rules" then the entry in question here would have a unique id and
a date. Rather than doing hashes or diffs of the contents of the
entry, we would be able to check that id and the modified or issued
date to determine if this was a "new" entry. But, with RSS, which has
no useable mechanism for providing unique ids (I've pointed out in
other messages why GUID is useless) and with no explicit indication of
"modified time", we're stuck believing that this and many other
messages from quicktopic are "new" every time we read them.
Problems of "ever-changing-items" also occur on sites like
InfoWorld that insert ads into their RSS feeds. Whenever the ad
changes, any hashing based solution is going to think the item has
My concern with this is not some "arrogant" "technology" push.
Customers complain that they are seeing the same item in their feeds
multiple times. We need Atom to prevent flooding them with duplicate
My belief is that the failings of RSS are so great and that
the quality of service we'll be able to provide with Atom feeds is so
much greater than what we can currently provide, that RSS use will
fall off rapidly once Atom becomes established. Users will demand it.