[
Lists Home |
Date Index |
Thread Index
]
> I don't believe that requiring well-formed XML is unrealistic in the
> least. Why do people find it unreasonable to produce well-formed XML?
> Are authors really hand-authoring RSS? I'm certainly not, and I
> suspect the thousands using various blogging tools aren't either.
Many of the current generation of blogging tools make it relatively easy to
generate non-XML feeds. What's worse is that most days it may be valid, then
on the day you don't test you paste in some inappropriate chars.
> The only possible problem I see is if the RSS/Atom is produced by
> screen scraping hand-authored HTML. But this is only a problem if the
> tools that do the screen scraping assume the HTML is well-formed and
> basically just copy and paste it, which is of course insane, broken,
> and brain-damaged. Are the authoring tools really that stupid? (I
> honestly don't know. I've only used the tools I've written.)
Try some of the tools. Most give the user plenty of freedom to produce
ill-formed markup.
> It is only sensible that a screen scraper should fix HTML using
> something like Tidy before including it in an XML document. This is
> perfectly OK. Data from non-XML source such as HTML documents, Word
> files, and SQL databases is included in XML documents all the time;
> and this is the appropriate time to make any fixes that are necessary
> to create well-formedness. However, once a document has been labelled
> as XML, it is no longer acceptable for downstream processes to make
> such fixes in it. It's just too hard to figure out what's missing,
> and correctly repair it,
But many of them clients don't care if it's XML or not, just as long as
there are pointy-brackets here and there.
>The proper response is to drop the document,
> and perhaps kick back an error to its publisher.
Quite.
Cheers,
Danny.
|