xml-dev - Re: [xml-dev] Postel's law, exceptions

Re: [xml-dev] Postel's law, exceptions

[ Lists Home | Date Index | Thread Index ]

To: Michael Champion <mc@xegesis.org>
Subject: Re: [xml-dev] Postel's law, exceptions
From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Date: Thu, 15 Jan 2004 12:31:36 -0500
Cc: xml-dev@lists.xml.org
In-reply-to: <6A0EE7CA-4778-11D8-8ADE-000A95CCC59E@xegesis.org>
References: <p06010205bc2c4d3b7399@[192.168.254.4]><830178CE7378FC40BC6F1DDADCFDD1D10190E1B6@RED-MSG-31.redmond.corp.microsoft.com> <CBA00C77-4638-11D8-8ADE-000A95CCC59E@xegesis.org><864E88FE-4651-11D8-8ADE-000A95CCC59E@xegesis.org><p06010205bc2c4d3b7399@[192.168.254.4]><5.1.0.14.2.20040115101443.03f5f5e0@www.gosmalltalk.com><6A0EE7CA-4778-11D8-8ADE-000A95CCC59E@xegesis.org>

At 11:32 AM -0500 1/15/04, Michael Champion wrote:

>So, while I acknowledge the facts on the ground, it doesn't seem to 
>be asking too much to have aggregators pass Atom directly to an XML 
>parser, and continue to perform all sorts of cleanup on the RSS 
>before passing it to an XML parser.  (Maybe that's not how most 
>aggregators work, but it's what Dare and Joe English described as 
>their basic architecture). I guess there are two parts to my 
>proposed solution: Educate people that Atom is XML, and if you want 
>to play the Atom game you really really ought to play by XML rules; 
>accept that this is somewhat unrealistic, but make cleanups 
>explicit, preferably on request rather than by magic, and mark the 
>result as fixed up in case anyone downstream cares.
I don't believe that requiring well-formed XML is unrealistic in the 
least. Why do people find it unreasonable to produce well-formed XML? 
Are authors really hand-authoring RSS? I'm certainly not, and I 
suspect the thousands using various blogging tools aren't either.

The only possible problem I see is if the RSS/Atom is produced by 
screen scraping hand-authored HTML. But this is only a problem if the 
tools that do the screen scraping assume the HTML is well-formed and 
basically just copy and paste it, which is of course insane, broken, 
and brain-damaged. Are the authoring tools really that stupid? (I 
honestly don't know. I've only used the tools I've written.)

It is only sensible that a screen scraper should fix HTML using 
something like Tidy before including it in an XML document. This is 
perfectly OK. Data from non-XML source such as HTML documents, Word 
files, and SQL databases is included in XML documents all the time; 
and this is the appropriate time to make any fixes that are necessary 
to create well-formedness. However, once a document has been labelled 
as XML, it is no longer acceptable for downstream processes to make 
such fixes in it. It's just too hard to figure out what's missing, 
and correctly repair it, The proper response is to drop the document, 
and perhaps kick back an error to its publisher.
-- 

   Elliotte Rusty Harold
   elharo@metalab.unc.edu
   Effective XML (Addison-Wesley, 2003)
   http://www.cafeconleche.org/books/effectivexml
   http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA

Follow-Ups:
- Re: [xml-dev] Postel's law, exceptions
  - From: Michael Champion <mc@xegesis.org>
- RE: [xml-dev] Postel's law, exceptions
  - From: "Danny Ayers" <danny666@virgilio.it>
- Re: [xml-dev] Postel's law, exceptions
  - From: Rich Salz <rsalz@datapower.com>

References:
- Re: [xml-dev] Postel's law, exceptions
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- RE: [xml-dev] Postel's law, exceptions
  - From: "Dare Obasanjo" <dareo@microsoft.com>
- Re: [xml-dev] Postel's law, exceptions
  - From: Michael Champion <mc@xegesis.org>
- Re: [xml-dev] Postel's law, exceptions
  - From: Michael Champion <mc@xegesis.org>
- Re: [xml-dev] Postel's law, exceptions
  - From: James Robertson <jarober@gosmalltalk.com>
- Re: [xml-dev] Postel's law, exceptions
  - From: Michael Champion <mc@xegesis.org>

Prev by Date: Re: [xml-dev] Postel's law, exceptions
Next by Date: Re: [xml-dev] Postel's law, exceptions
Previous by thread: RE: [xml-dev] Postel's law, exceptions
Next by thread: Re: [xml-dev] Postel's law, exceptions
Index(es):
- Date
- Thread