John Cowan wrote
Thats not what he meant at all
Yes indeed, it is a wonderful quote. (I thought i made it clearer later that actually i agree with knuth: my friendly caution is about developers taking it as a mantra to indescriminantly avoid the hard practicalities.)
John will certainly remember the old-school data sgml modelers, who said "just model the data", i.e. ignore any usage or application constraints. They took the unwise leap from markup languages *allowing* the separation of presentation and content, to saying that you should actively ignore usage issues... If the input documents have cals tables, and the output documents also use cals tables, ignore that and consider Rdf for the tables! Sgml got a terrible reputation for implementation difficulty because of it: but it was the methodology what done it. (Though, as with anything, sometimes that approach may have been appropriate.)
But i think the onus needs to be on content architects who develop new intermediate formats to demonstrate that it represent the shortest distance between the inputs and outputs. (Or has some other hard non-theoretical benefit otherwise.) Knuth mentions 12% as being worthwhile; how many schema designers actually make a guestimate of the programming costs when deciding between different structures?
The trouble with this kind of thinking (i.e. that we should minimize input/output distance in our schemas) is that it leads to conclusions that are unpallatable for some: in particular, if html/rss are the main output formats, it encourages the back propogation of html-isms. (I was sacked from a job once for suggesting that extended html would suffice, in the 90s.) The design issue gets resolves itself to 'how can we extend html with our more semantic markup?" At which stage the lack of power of dtds and xsd, e.g to have content models keyed by attribute values rather than just generic identifiers, brings people to a grim stop.
Cheers
Rick