OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] xml over http - RFC 3023

>> I think many parsers can read from a web resource, but few use the
>> encoding information from the content type.
> The thing is that XML documents are designed to be read where there is no
> external content-type information (such as from a filesystem) as well as
> where there is.

At the moment I'm:

- using the encoding in the ContentType
- if that's not present, using the encoding in the prolog (having read
those first few bytes in us-ascii)
- if that's not present deafulting to UTF-8

That seems to cover most bases.

Isn't there a reliance though, that the actual encoding and the
encoding used to serve the file match?  If the xml is windows-1252
(with the encoding correctly specified in the prolog) but served as
utf-8 then and characteres in the C0/C1 ranges will cause a parse
error won't they?

The only way would be to either read the prolog first and then serve
the file using the encoding specfied in that, or parse the XML then
serialize back to bytes using whatever encoding is used in the
contenttype (when serving a static file from disk).  It's almost as if
the server needs to treat XML as a special case.  I'm still learning
all this and thinking out loud here.

Andrew Welch
Kernow: http://kernowforsaxon.sf.net/

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS