[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Why is Encoding Metadata (e.g. encoding="UTF-8) put Inside the XML Document?
- From: David Carlisle <davidc@nag.co.uk>
- To: costello@mitre.org
- Date: Wed, 19 Sep 2007 14:10:41 +0100
One thing that your notes don't emphasise is that if the encoding is
specified in the http headers (or similar external metadata) any
encoding specified in the file must be ignored.
http://www.ietf.org/rfc/rfc3023.txt
There are several reasons that the charset parameter is
authoritative. First, some MIME processing engines do transcoding
of MIME bodies of the top-level media type "text" without
reference to any of the internal content. Thus, it is possible
that some agent might change text/xml; charset="iso-2022-jp" to
text/xml; charset="utf-8" without modifying the encoding
declaration of an XML document. Second, text/xml must be
compatible with text/plain, since MIME agents that do not
understand text/xml will fallback to handling it as text/plain.
If the charset parameter for text/xml were not authoritative, such
fallback would cause data corruption.
The reasons (quoted above) for this are sound in theory but a real pain
in practice where it is often easy to put the right encoding
declaration in the XML file and hard to get the right encoding specified
in the mime headers.
David
________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.
________________________________________________________________________
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]