[
Lists Home |
Date Index |
Thread Index
]
On Jan 13, 2004, at 3:26 PM, Simon St.Laurent wrote:
> This has come up before, but the chant "Postel's law has no exceptions"
> seems to be coming again, in the RSS context.
Is this really about Postel's law or pushback on the overly draconian
(in some opinions) XML spec? Or is this a typical weblog/RSS
community bashathon because (ahem) one of the vocal proponents of
being conservative apparently wrote software that doesn't actually
conform to the XML spec?
> http://www.intertwingly.net/blog/1685.htm
OK, I'll bite and expose my lack of attention to the details -- why
are smart quotes illegal in XML? (Or is it that the encoding is
mis-specified?) Was this proposed to be fixed in XML 1.1? Does anyone
outside the RSS/Atom world complain about this?
The larger issue seems to be Gresham's Law ("bad money drives out
good"). I know that as a *consumer* of aggregation tools, I don't care
a whit whether the input is raw text, tag soup HTML, XHTML, valid
instances of one of several flavors of RSS/Atom, or what -- I just want
to quickly see what has changed on the set of web resources I'm
interested in that use some chronological layout convention (news, RSS
feed, email archive, whatever). I use the most liberal tools I can
find. If being liberal is inconvenient for the aggregator developer,
I'll just find another aggregator or hack up something that does what I
want. The last thing on earth I want to do is whine at, for example,
some poor woman in Bagdhad about the format of her weblog; I want to
hear what she has to say. (FWIW, Bloglines apparently uses something
called sitescooper to enable this).
That's not to say that the specs should condone violations; the whole
point of Atom is to provide an authoritative spec that is build on real
standards such as XML and written in such a way as to allow it to be
implemented from the spec itself rather than having to ask the
community or a committee. If the community of weblog software
developers gets its act together and the number of ill-formed feeds
becomes vanishingly small, great for everyone -- I can read anything I
want in any product I choose. But that happy situation won't come about
by market forces, it will come about by some sort of coercion (moral,
economic, legal, etc.). If there's real money that falls on the floor
due to the chaos, someone will come along and clean it up, i.e. make
Dodge City safe for the banks and railroads.
That might happen by coming up with a common standard and enforcing it;
my guess, however, is that "text mining" technologies will make the
whole question moot by using smarter software rather than insisting on
more rigid data. (See IBM's immense investment in WebFountain, for
example) Real Soon Now we won't have to care whether text is raw
email, XHTML, valid Atom, or tag soup to consume it selectively in what
we now call aggregators. If software can tag data, or at least extract
the implicit structure of data and emit it in a valid markup syntax,
then not even the geekiest of markup or syndication geeks will care
about missing closing tags or smart quotes anymore. In other words,
we'll see "Postel Machines" that liberally take tag soup and raw text,
and emit conservatively structured data or valid markup to make life
easier for the downstream processors.
|