OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] xml over http - RFC 3023

Hi Rick,

> The out-of-band signalling of character encoding is a fundamentally broken
> idea, because there are no mechanisms for programs which generate data to
> memoize the character encoding used that can then feed the rest of the
> food-chain.

How about the BOM - that's one way isn't it?  I wonder if a similar
ignorable byte sequence could be added to the start of all byte
sequences to indicate the encoding of what's coming.

>> At the moment it all seems pretty complicated...

> It is not complicated. Use application/xml
> If you do find intermediate web systems that implement the ASCII default or
> the IS8859-1 default as anything other than 8-bit clean for text/xml submit
> a bug report.

I'm dealing with RSS feeds from all over the world, so it's:

- 3 different types of feeds
- multiple languages, multiple encodings
- embedded inconsistenly escaped html, or cdata sections, or both
- and even, use of entities without even including the doctype, so it
doesn't even parse without help

It is possible to reject some of the feeds, but other readers accept
them so this one needs to at least match them before taking the moral
high ground (and it's not too hard to code around the problems).

So this is a real test of XML on the web.  The complicated part I was
referring to is reading the bytes from the http input stream in the
right encoding:

- extract the encoding from the contenttype
- if its not there read the first few bytes of stream in us-ascii and
then extra the encoding from the prolog
- if its not there use utf-8
- hope that actual encoding of the file and the encoding you've discovered match

...and that's not even completely correct as far as I understand.

So when you say:

"It is not complicated. Use application/xml"

I don't get it, what am I missing?

I would've thought the webserver would be aware that it was serving
xml and take of it - it could extract the encoding from the xml prolog
and ensure the file was served with that (maintaining it however it
liked)... it seems odd that the client should go through this process
every time.

Andrew Welch
Kernow: http://kernowforsaxon.sf.net/

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS