[
Lists Home |
Date Index |
Thread Index
]
bryan wrote:
> One of the things I would want to use namespaces for is to return
> namespaced html instead of as you pointed out " the bizarre practice of
> CDATA-escaping random HTML-ish text " but this is only starting to be
> done now, why was it not done in earlier versions? What were the excuses
> for the bizarre practice
I agree that it's bizarre and offensive, but these people are not
completely nuts. Think of it from the point of view of the aggregator
writer. They want to parse an RSS feed as XML, and they want to parse
each entry to get the <title> and <author> and <link> and so on. Then
they get to the content. They have an HTML renderer which will render
this prettily. So they want to take all the bytes between <content> and
</content> (those are atom tags, not RSS tags, but same difference), and
hand them to the HTML renderer. They don't want to parse them, because
they'd just be doing a no-op and putting them back to together again to
hand them to the renderer.
On the producer's side, a lot of the authoring tools give authors a lot
of freedom in whatever editing tool they like, and to enforce that this
be XHTML is a lot of extra work that's not done yet.
So both the producers *and* the consumers are happier using this
horrible escaped-HTML stuff. I and several others have told them that
they shouldn't want to do this, but it doesn't seem to work.
As several others have pointed out, if the content were well-formed they
could do XPath magic, and filter out dangerous things like <script>, and
bask in the glow of karmic goodness. In response they say "I don't want
to do XPath magic, and my HTML renderer has a safe-sandbox mode, and I
just want the stuff I care about (<title>, <link>, remember) in XML and
the rest is a bag of bits, so extend me no markup.
Realistically, I think we're stuck with it. At least Atom will *let*
you make the content well-formed. Then evolution takes over.
--
Cheers, Tim Bray
(ongoing fragmented essay: http://www.tbray.org/ongoing/)
|