[
Lists Home |
Date Index |
Thread Index
]
> On Jan 15, 2004, at 9:04 AM, Elliotte Rusty Harold wrote:
> >
> > This is also the point of view taken by Walter Perry. However, what
> > you're missing here is the assumption (certainly in Walter's case, and
> > I think in Uche's and Sean's as well) that the documents are
> > well-formed. They are willing to process invalid documents. However,
> > well-formedness is their minimum requirement. Although the Atom folks
> > frequently confuse their language, what they seem to be asking for is
> > the option to pass around malformed documents.
Anyone can pass around malformed documents, it's always an option. I think
you must be confused over the issues.
> I agree the *Atom*, being designed as it is for Mr. Safe and
> presumably produced by XML-conforming tools should be quite strict --
> if something claims to be an Atom feed, it should definitely at a very
> minimum be well-formed XML.
Agreed. At a very minimum.
(RSS is another kettle of fish, it seems
> broken as designed, and that hasn't stopped its viral spread. Don't
> bother trying to cure it, you might kill it.) Atom's value proposition
> is that it will (someday) be a real spec, with real rules on how to
> produce it and validate it. It's less clear where the truly optimal
> place to reject (or optionally fix) a problem is.
Recent (pretty ad hoc) surveys suggest that the XML quality of RSS feeds in
general isn't that bad as one might imagine, given the proliferation of
regexp-hack parsers. Given a little feedback in the right direction, the
global system for Atom could easily stabilise comfortably on the side of
good data.
The optimal place for checks isn't an easy call. But I think Tim Bray was
probably right when he suggested that all it needed was one or more of the
leading aggregator (consumer) apps to take the high road to bootstrap the
system back to source.
That's why Gresham's
> Law applies -- no individual benefits from rejecting a bad document,
> but the system as a whole benefits if bad documents can be kept out.
> Atom can start over and build a community with a strong ethic that
> everyone should be checking for "counterfeit" Atom feeds. I guess I
> should just shut up and let you folks play Enforcer :-) but I'm
> skeptical that this will work (for human reasons) -- the net effect
> will be to create a "buzz" that Atom is something you don't want to
> mess with because you'll get flamed (or spammed <grin>) by geeks who
> babble about stuff that you don't care about. In a world where Atom
> is stillborn, this whole discussion is moot.
I don't think it needs a strong ethic applied widely - as Tim Bray
suggested, adoption by one or two of the leading tool vendors should be
enough to tip the balance. This approach (at the well-formedness level at
least) has already been adopted by two of the bigger players.
> My major point here is that simply rejecting bad data is not a great
> option for any single actor in the system, and there's no global
> enforcement mechanism, so services (code, SOAP-y, RESTful, whatever)
> that fix bad data and make it real XML are a Good Thing -- everyone
> downstream gets the advantages of XML, the original creators (who we
> value for what they have to say, not their choice of software tools)
> aren't stifled by the necessity of understanding the details of utf-8
> vs iso-8859 encoding of the characters their authoring tools produce.
> Ideally these services get invoked before the data is even serialized,
> but invoking them anywhere far upstream can work too.
There is a lot of diversity in tools for the production of material for
syndication, but if it was clearly stated that true XML was the way to go
from day one, producers would be more likely to take this on board.
The downside is
> that it is *possible* that sometimes the fixes could distort the
> meaning of truly horribly broken stuff that the fixer tries too hard to
> clean up. For the domain of weblog syndication, it's hard to get too
> excited about this problem. For the domain of data feeds tunnelled
> through Atom, this is a real issue, and the *option* to track and
> reject data that has been "fixed" is necessary. That seems like a more
> productive and politically viable approach than saying Thou Shalt Not
> Process Malformed XML, Ever.
Yep, this is a downside. Most of the current generation of syndication tools
(i.e. RSS) do little more than simple rendering of the feed data, so for
them it's not such an issue. I wouldn't have talked in terms of tunneling
through Atom, but certainly richer feed data has increased demands on
quality, as do cataloguing and indexing systems. I've a feeling many
developers in the RSS community can't see beyond the simple newsreader, and
so they're less aware of the value-add offered by good data. The future
holds a lot more promise for those that want to do new stuff with good data
than those that just want to do the same stuff as they did with tag soup.
Cheers,
Danny.
|