xml-dev - Re: [xml-dev] rss regularis(z)ation

Re: [xml-dev] rss regularis(z)ation

[ Lists Home | Date Index | Thread Index ]

To: bryan <bry@itnisk.com>
Subject: Re: [xml-dev] rss regularis(z)ation
From: Tim Bray <tbray@textuality.com>
Date: Wed, 23 Jul 2003 17:40:09 -0700
Cc: 'Mike Champion' <mc@xegesis.org>, xml-dev@lists.xml.org
In-reply-to: <001701c35029$acc504a0$2001a8c0@bryans>
References: <001701c35029$acc504a0$2001a8c0@bryans>
User-agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4) Gecko/20030624

bryan wrote:

> One of the things I would want to use namespaces for is to return
> namespaced html instead of as you pointed out " the bizarre practice of
> CDATA-escaping random HTML-ish text " but this is only starting to be
> done now, why was it not done in earlier versions? What were the excuses
> for the bizarre practice

I agree that it's bizarre and offensive, but these people are not 
completely nuts.  Think of it from the point of view of the aggregator 
writer.  They want to parse an RSS feed as XML, and they want to parse 
each entry to get the <title> and <author> and <link> and so on.  Then 
they get to the content.  They have an HTML renderer which will render 
this prettily.  So they want to take all the bytes between <content> and
</content> (those are atom tags, not RSS tags, but same difference), and 
hand them to the HTML renderer.  They don't want to parse them, because 
they'd just be doing a no-op and putting them back to together again to 
hand them to the renderer.

On the producer's side, a lot of the authoring tools give authors a lot 
of freedom in whatever editing tool they like, and to enforce that this 
be XHTML is a lot of extra work that's not done yet.

So both the producers *and* the consumers are happier using this 
horrible escaped-HTML stuff.  I and several others have told them that 
they shouldn't want to do this, but it doesn't seem to work.

As several others have pointed out, if the content were well-formed they 
could do XPath magic, and filter out dangerous things like <script>, and 
bask in the glow of karmic goodness.  In response they say "I don't want 
to do XPath magic, and my HTML renderer has a safe-sandbox mode, and I 
just want the stuff I care about (<title>, <link>, remember) in XML and 
the rest is a bag of bits, so extend me no markup.

Realistically, I think we're stuck with it.  At least Atom will *let* 
you make the content well-formed.  Then evolution takes over.
-- 
Cheers, Tim Bray
         (ongoing fragmented essay: http://www.tbray.org/ongoing/)

Follow-Ups:
- RE: [xml-dev] rss regularis(z)ation
  - From: "Danny Ayers" <danny666@virgilio.it>

References:
- RE: [xml-dev] rss regularis(z)ation
  - From: "bryan" <bry@itnisk.com>

Prev by Date: Re: [xml-dev] more politics
Next by Date: Uniqueness across different child element types
Previous by thread: Re: [xml-dev] rss regularis(z)ation
Next by thread: RE: [xml-dev] rss regularis(z)ation
Index(es):
- Date
- Thread