OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] xml over http - RFC 3023

Julian Reschke wrote:
> Andrew Welch wrote:
>> Hi all,
>> There's a very good article here about the problem of reading feeds
>> from all over the world in different encodings:
>> http://www.xml.com/pub/a/2004/07/21/dive.html
>> It describes how you (sometimes) have the encoding in the http  
>> content
>> type but also the encoding in the xml prolog, and the problems of
>> choosing which to use.
>> It also talks of RFC 3023 which sounds like it was an attempt to sort
>> it out.  The article is dated July 2004 and I'm wondering if there's
>> any more recent information?  Is there any support in modern  
>> parsers -
>> for example can I give the parser a URL and it takes care of the  
>> rest?
> I think many parsers can read from a web resource, but few use the  
> encoding information from the content type.

The thing is that XML documents are designed to be read where there is  
no external content-type information (such as from a filesystem) as  
well as where there is. The spec says you can leave out the encoding  
declaration where it's not UTF8 or UTF-16 and the encoding can be  
determined from an external content-type, but then it has to be kept  
in metadata somewhere, which is just very unlikely unless you have a  
full blown content management system and all the processes to ensure  
that documents and their metadata are kept in sync. It's generally  
just much easier for people to put the encoding directly in the  
document (or entity), in which case any external content-type can and  
should be ignored.

>> At the moment it all seems pretty complicated... especially
>> considering XML was designed for the web.  The problem of parsing
>> feeds from all over the world must have tackled a few times over by
>> now?
> There's a related HTTPbis issue -- HTTP/1.1 (RFC 2616) defines a  
> default encoding for text/* -- in retrospective a bad idea, at least  
> for XML -- see <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/20>.
> Of course the simple workaround is not to use a text/* content type  
> (so this is one of the many problems you don't have with Atom).

Chris Burdess

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS