Hi Folks,
This discussion is
excellent! Thank you very much for your outstanding comments. Here’s
version #3. Please let me know of any remaining inaccuracies.
/Roger
A Summary of XML and Media Types (MIME)
XML Data is Assigned what MIME Type?
At this URL is a list of the 350 different MIME types:
http://www.iana.org/assignments/media-types/
In this list you will see that there are two MIME type
assignments to XML data:
application/xml
text/xml
The later MIME type (text/xml) has been deprecated. Thus, the official MIME
type assignment to XML data is:
application/xml
Note the format for expressing MIME types – it
contains two parts, separated by a slash:
type/subtype
Where are MIME Types Used?
MIME types are used in exchanges of data on the Web.
For example, suppose you send some data to a Web server. The data arrives
at the Web server as just a stream of bytes. The data has no filename or
file extension (e.g., “.xml”).
How does the Web server know that the bytes represent XML data and not some
other kind of data?
Answer: when data is sent across the Web, it is sent as
the payload of an HTTP message. In the HTTP header is a field called
Content-type, and the value of this field is a MIME type, e.g.,
Content-type: application/xml
Thus, when the Web server receives the stream of bytes it
examines the Content-type header field to determine the type of data.
Going the other direction, when a Web server sends out data it assigns a MIME type to the
transmitted data in the same fashion – assigning a MIME type to the data via
the Content-type header.
When a Web browser
receives data (more precisely, receives a stream of bytes) from the Web, it looks
at the MIME type in Content-type to determine how to render the data.
To recap, MIME types were created for networks. MIME
types inform network applications (e.g.,
Web servers and browsers) of the type of data.
XML doesn’t “have” a
MIME Type!
To say that XML “has” a MIME type of application/xml is
misleading. It suggests that somehow a MIME type is part of, or a
property of, an XML document. It is not. An XML document does not
have a MIME property. A stream of bytes may be “assigned” the
MIME type application/xml.
A MIME type is an externally applied label. It is
pure metadata. It is similar in many ways to a file type extension (e.g., “.xml”, “.txt”)
but not as persistent.
What is the Purpose of a MIME Type?
The purpose of a MIME type is to provide network
applications such as Web browsers and Web servers information about the type of
data it has received.
How does Data get Assigned a MIME Type?
Suppose a Web server is invoked, and suppose the data that
it is to return is located in a file on its local file system. The Web
server reads in the file’s data and constructs an HTTP message. How
does it assign an appropriate MIME type to the data? (i.e., how does it fill in the Content-type
field?)
Answer: heuristics are used for determining the MIME type
of a resource. In other words, the Web server “guesses” what
the MIME type is. The heuristics used depend on the operating system
(platform).
On Windows the MIME type is guessed from the file
extension. In the Window’s Registry is a mapping from file
extension to MIME type.
Examples:
- If a file ends with the
extension .txt then the MIME type is guessed to be text/plain.
- If a file ends with the
extension .doc then the MIME type is guessed to be application/msword.
- If a file ends with the
extension .xml then the MIME type is guessed to be application/xml.
- If a file ends with the
extension .zip then the MIME type is guessed to be application/zip.
On Unix and Linux systems the MIME type is not determined based
on file extension, it is determined by a variety of heuristics.
Thus we see that MIME type provides information about the
format of data, independent of platform!
Can Data be Assigned a Wrong MIME Type?
Yes! You could create a Word document and
deliberately give the document a false extension, such as
“.xml”. If the Web server is running on a Windows machine
then the Web server will assign to the data a MIME type of application/xml, which is clearly incorrect.
Browsers use MIME Type to Decide how to Render
a Resource
With the exception of Internet Explorer, browsers use MIME type to decide
how to render a resource. If the resource is retrieved from the Web then
the MIME type is found in the HTTP header. If the resource comes from a
file on the local file system then the MIME type is guessed using heuristics,
as described above.
Internet Explorer uses the MIME type found in the
HTTP header to determine how to render a resource retrieved from the Web.
But for resources that come from the local file system, Internet Explorer uses the file extension to
determine how to render the resource.
What is an XML Document?
Take this simple XML:
<?xml version="1.0"?>
<root>
Blah
</root>
and put it into Word and deliberately give the document
the incorrect extension “.xml”. Name the file “Blah.xml”.
Now make this file Web-accessible. Next, suppose someone on the Web requests
the file by issuing its URL, e.g.,
http://www.example.org/Blah.xml
The Web server will read in the data from the file, and
assign a MIME type to it. Let’s assume the Web server is running on
a Windows machine; then the Web server will use the file extension to assign a
MIME type. Here’s the MIME type it will assign to the data:
application/xml
However, the data is not XML. It is Word.
Conversely, take the same XML and put it into Notepad
and give the document the extension “.txt”. Name the file “Blah.txt”.
Make this file also Web-accessible. Next, suppose someone on the Web requests
the file by issuing its URL, e.g.,
http://www.example.org/Blah.txt
The Web server will read in the data from the file, and
assign a MIME type to it. Let’s assume the Web server is running on
a Windows machine; then the Web server will use the file extension to assign a
MIME type. Here’s the MIME type it will assign to the data:
text/plain
Yet, it is an XML document.
So, what is an XML document?
Answer: an XML document is one that may or may not have an
XML declaration and is followed by a root element, with well-formed content.
If you open the above Word document you won’t find an XML declaration or
a root element. If you open the above Notepad document you will
find an XML declaration as the first thing.
Further Information
Elliotte Rusty Harold has written an excellent article on
this subject:
http://www-128.ibm.com/developerworks/xml/library/x-mxd2.html
Acknowledgements
I would like to gratefully acknowledge the excellent
inputs from these people:
Mitch Amiano
Elliotte Rusty Harold
Rick Jelliffe
Amelia Lewis
Frank Manola
Rick Marshall
Dave Pawson
Bryan Rasmussen
Henri Sivonen
Nathan Young