Lists Home |
Date Index |
- From: David Megginson <firstname.lastname@example.org>
- To: xml-dev Mailing List <email@example.com>
- Date: Sat, 20 Dec 1997 13:29:28 -0500
MURATA Makoto writes:
> > http://www.microstar.com/XML/donne.xml
> > http://home.sprynet.com/sprynet/dmeggins/texts/darkness/darkness.xml
> As a co-editor of an (upcoming) RFC for text/xml and
> application/xml, I think that I should point out the correct
> procedure for encoding determination. (I have not checked these
> two Web sites, and Ælfred.)
Thank you very much for the information. Currently, both of these web
servers return "application/octet-stream" as the MIME type for *.xml
and *.dtd files: in this case, is it correct for an XML parser to fall
back on other character-encoding detection techniques, as Ælfred does?
> For those XML documents transmitted by the HTTP protocol, XML parsers
> should use the charset parameter of the media type text/xml (BTW,
> the default of this parameter is 8859-1). XML parsers should ignore
> the encoding declaration within XML documents transmitted by HTTP.
> More about this, see the XML PR and the HTTP/1.1
I have two important queries:
1) Are you certain that ignoring the encoding declaration is
conforming behaviour? It seems to me that it would make more sense
to report an error if the charset parameter and the encoding
declaration differ (especially since the PR requires any document
without a BOM or encoding declaration to be in UTF-8).
2) Why pick a default encoding that conforming XML parsers are not
required to support? Ælfred does accept encoding="ISO-8859-1", but
some other parsers do not. It seems to me that either the RFC or
the PR needs to be amended.
I can also anticipate a different problem: few private people (as
opposed to companies or organisations) have any control at all over
what their HTTP servers send out.
Imagine an exchange student at a big American University, who wants to
publish a UTF-8 or UCS-2 Arabic XML text in her personal web space.
She will have a very hard time even finding out who is in charge of
the university's HTTP server (if she knows what an HTTP server is),
and she will probably have graduated before the university's
administration has gotten around to approving letting the web-master
look into reporting the correct encoding for her document.
In the end, it looks like application/xml is a _much_ better choice
than text/xml -- with Ælfred, I have found that I can do a very good
job autodetecting character encoding, and I imagine that other parser
writers will find the same.
All the best,
David Megginson firstname.lastname@example.org
Microstar Software Ltd. email@example.com
xml-dev: A list for W3C XML Developers. To post, mailto:firstname.lastname@example.org
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:email@example.com the following message;
To subscribe to the digests, mailto:firstname.lastname@example.org the following message;
List coordinator, Henry Rzepa (mailto:email@example.com)