[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] xml over http - RFC 3023
- From: Chris Burdess <dog@bluezoo.org>
- To: Julian Reschke <julian.reschke@gmx.de>
- Date: Fri, 28 Nov 2008 10:28:59 +0000
Julian Reschke wrote:
> Andrew Welch wrote:
>> Hi all,
>> There's a very good article here about the problem of reading feeds
>> from all over the world in different encodings:
>> http://www.xml.com/pub/a/2004/07/21/dive.html
>> It describes how you (sometimes) have the encoding in the http
>> content
>> type but also the encoding in the xml prolog, and the problems of
>> choosing which to use.
>> It also talks of RFC 3023 which sounds like it was an attempt to sort
>> it out. The article is dated July 2004 and I'm wondering if there's
>> any more recent information? Is there any support in modern
>> parsers -
>> for example can I give the parser a URL and it takes care of the
>> rest?
>
> I think many parsers can read from a web resource, but few use the
> encoding information from the content type.
The thing is that XML documents are designed to be read where there is
no external content-type information (such as from a filesystem) as
well as where there is. The spec says you can leave out the encoding
declaration where it's not UTF8 or UTF-16 and the encoding can be
determined from an external content-type, but then it has to be kept
in metadata somewhere, which is just very unlikely unless you have a
full blown content management system and all the processes to ensure
that documents and their metadata are kept in sync. It's generally
just much easier for people to put the encoding directly in the
document (or entity), in which case any external content-type can and
should be ignored.
>> At the moment it all seems pretty complicated... especially
>> considering XML was designed for the web. The problem of parsing
>> feeds from all over the world must have tackled a few times over by
>> now?
>
> There's a related HTTPbis issue -- HTTP/1.1 (RFC 2616) defines a
> default encoding for text/* -- in retrospective a bad idea, at least
> for XML -- see <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/20>.
>
> Of course the simple workaround is not to use a text/* content type
> (so this is one of the many problems you don't have with Atom).
Indeed.
--
Chris Burdess
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]