OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Postel's law, exceptions

[ Lists Home | Date Index | Thread Index ]

At 11:32 AM -0500 1/15/04, Michael Champion wrote:

>So, while I acknowledge the facts on the ground, it doesn't seem to 
>be asking too much to have aggregators pass Atom directly to an XML 
>parser, and continue to perform all sorts of cleanup on the RSS 
>before passing it to an XML parser.  (Maybe that's not how most 
>aggregators work, but it's what Dare and Joe English described as 
>their basic architecture). I guess there are two parts to my 
>proposed solution: Educate people that Atom is XML, and if you want 
>to play the Atom game you really really ought to play by XML rules; 
>accept that this is somewhat unrealistic, but make cleanups 
>explicit, preferably on request rather than by magic, and mark the 
>result as fixed up in case anyone downstream cares.
I don't believe that requiring well-formed XML is unrealistic in the 
least. Why do people find it unreasonable to produce well-formed XML? 
Are authors really hand-authoring RSS? I'm certainly not, and I 
suspect the thousands using various blogging tools aren't either.

The only possible problem I see is if the RSS/Atom is produced by 
screen scraping hand-authored HTML. But this is only a problem if the 
tools that do the screen scraping assume the HTML is well-formed and 
basically just copy and paste it, which is of course insane, broken, 
and brain-damaged. Are the authoring tools really that stupid? (I 
honestly don't know. I've only used the tools I've written.)

It is only sensible that a screen scraper should fix HTML using 
something like Tidy before including it in an XML document. This is 
perfectly OK. Data from non-XML source such as HTML documents, Word 
files, and SQL databases is included in XML documents all the time; 
and this is the appropriate time to make any fixes that are necessary 
to create well-formedness. However, once a document has been labelled 
as XML, it is no longer acceptable for downstream processes to make 
such fixes in it. It's just too hard to figure out what's missing, 
and correctly repair it, The proper response is to drop the document, 
and perhaps kick back an error to its publisher.
-- 

   Elliotte Rusty Harold
   elharo@metalab.unc.edu
   Effective XML (Addison-Wesley, 2003)
   http://www.cafeconleche.org/books/effectivexml
   http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS