Hi Folks,
Many thanks to all those
who participated. This has been an outstanding discussion. We have
arrived at a really good collection of information.
[Rick, your idea of
adding this information to Wikipedia is excellent. Would you do the
honors?]
The writeup below has
two additions to the Summary #3 version:
1. In the first section I added a brief discussion on MIME types of
data that use a standard XML vocabulary (e.g.,
xsl, svg, xhtml).
2. I added a new section titled: Beyond MIME Types?
In addition I made several
small editing fixes.
Thanks again everyone!
/Roger
A Summary of XML and Media Types (MIME)
XML Data is Assigned what MIME Type?
At this URL is a list of over 350 different MIME types:
http://www.iana.org/assignments/media-types/
In the list you will see that there are two MIME type
assignments to XML data:
application/xml
text/xml
The later MIME type (text/xml) has been deprecated. Thus, the official MIME
type assignment to XML data is:
application/xml
Note the format for expressing MIME types – it
contains two parts, separated by a slash:
type/subtype
The MIME type list also has MIME type assignments to data that
uses a standard XML vocabulary. For example, this MIME type:
application/xsl + xml
is assigned to data that uses the XSL vocabulary.
And this MIME type:
image/svg + xml
is assigned to data that uses the SVG vocabulary.
Note the format for expressing MIME types of data that
uses a specific XML vocabulary:
type/___ + xml
where ___ is replaced by the name of an XML
vocabulary (xsl, svg, xhtml, etc.).
Where are MIME Types Used?
MIME types are used in exchanges of data on the Web.
For example, suppose you send some data to a Web server. The data arrives
at the Web server as just a stream of bytes. The data has no filename or
file extension (e.g.,
“.xml”). How does the Web server know that the bytes
represent XML data and not some other kind of data?
Answer: when data is sent across the Web, it is sent as
the payload of an HTTP message. In the HTTP header is a field called Content-type,
and the value of this field is a MIME type, e.g.,
Content-type: application/xml
Thus, when the Web server receives the stream of bytes it
examines the Content-type header field to determine the type of
data.
Going the other direction, when a Web server sends out data it assigns a MIME type to
the transmitted data in the same fashion – assigning a MIME type to the
data via the Content-type header.
When a Web browser
receives data (more precisely, receives a stream of bytes) from the Web, it
looks at the MIME type in Content-type to determine how to render the
data.
To recap, MIME types were created for networks. MIME
types inform network applications (e.g.,
Web servers and browsers) of the type of data.
XML doesn’t “have” a
MIME Type!
To say that XML “has” a MIME type of application/xml is
misleading. It suggests that somehow a MIME type is part of, or a
property of, an XML document. It is not. An XML document does not
have a MIME property. A stream of bytes may be “assigned” the
MIME type application/xml.
A MIME type is an externally applied label. It is
pure metadata. It is similar in many ways to a file type extension (e.g., “.xml”,
“.txt”) but not as persistent.
What is the Purpose of a MIME Type?
The purpose of a MIME type is to provide network
applications such as Web browsers and Web servers information about the type of
data it has received. With information about the type of data it has received,
a Web browser or Web server can then trigger an appropriate program to handle
the data.
How does Data get Assigned a MIME Type?
Suppose a Web server is invoked, and suppose the data that
it is to return is located in a file on its local file system. The Web
server reads in the file’s data and constructs an HTTP message. How
does it assign an appropriate MIME type to the data? (i.e., how does it fill in the Content-type
field?)
Answer: heuristics are used for determining the MIME type
of a resource. In other words, the Web server “guesses” what
the MIME type is. The heuristics used depend on the operating system
(platform).
On Windows the MIME type is guessed from the file
extension. In the Window’s Registry is a mapping from file
extension to MIME type.
Examples:
- If a file ends with the
extension .txt then the MIME type is guessed to be text/plain.
- If a file ends with the
extension .doc then the MIME type is guessed to be application/msword.
- If a file ends with the
extension .xml then the MIME type is guessed to be application/xml.
- If a file ends with the extension
.zip then the MIME type is guessed to be application/zip.
On Unix and Linux systems the MIME type is not determined
based on file extension, it is determined by a variety of heuristics.
Thus we see that MIME type provides information about the
format of data, independent of platform!
Can Data be Assigned a Wrong MIME Type?
Yes! You could create a Word document and
deliberately give the document a false extension, such as
“.xml”. If the Web server is running on a Windows machine then
the Web server will assign to the data a MIME type of application/xml, which is clearly incorrect.
Browsers use MIME Type to Decide how to
Render a Resource
With the exception of Internet Explorer, browsers use MIME type to
decide how to render a resource. If the resource is retrieved from the
Web then the MIME type is found in the HTTP header. If the resource comes
from a file on the local file system then the MIME type is guessed using
heuristics, as described above.
Internet Explorer uses the MIME type found in the
HTTP header to determine how to render a resource retrieved from the Web.
But for resources that come from the local file system, Internet Explorer uses the file extension to
determine how to render the resource.
Beyond MIME Types?
MIME types may be criticized for not capturing some important
metadata. For example, suppose that a Web server receives some data, and
in the Content-type
header it shows this MIME type:
Content-type: application/msword
The MIME type tells the Web server that the data is Word
data. But which version of Word is
the data? The MIME type doesn’t tell.
What is an XML Document?
Take this simple XML:
<?xml version="1.0"?>
<root>
Blah
</root>
and put it into Word and deliberately give the document
the incorrect extension “.xml”. Name the file “Blah.xml”.
Now make this file Web-accessible. Next, suppose someone on the Web
requests the file by issuing its URL, e.g.,
http://www.example.org/Blah.xml
The Web server will read in the data from the file, and
assign a MIME type to it. Let’s assume the Web server is running on
a Windows machine; then the Web server will use the file extension to assign a
MIME type. Here’s the MIME type it will assign to the data:
application/xml
However, the data is not XML. It is Word.
Conversely, take the same XML and put it into Notepad
and give the document the extension “.txt”. Name the file
“Blah.txt”. Make this file also Web-accessible. Next,
suppose someone on the Web requests the file by issuing its URL, e.g.,
http://www.example.org/Blah.txt
The Web server will read in the data from the file, and
assign a MIME type to it. Let’s assume the Web server is running on
a Windows machine; then the Web server will use the file extension to assign a
MIME type. Here’s the MIME type it will assign to the data:
text/plain
Yet, it is an XML document.
So, what is an XML document?
Answer: an XML document is one that may or may not have an
XML declaration and is followed by a root element, with well-formed
content. If you open the above Word document you won’t find an
XML declaration or a root element. If you open the above Notepad
document you will find an XML declaration as the first thing.
Further Information
Elliotte Rusty Harold has written an excellent article on
this subject:
http://www-128.ibm.com/developerworks/xml/library/x-mxd2.html
Acknowledgements
I would like to gratefully acknowledge the excellent
inputs from these people:
Mitch Amiano
David Carlisle
Elliotte Rusty Harold
Bob Irving
Rick Jelliffe
Michael Kay
Amelia Lewis
Frank Manola
Rick Marshall
Dave Pawson
Bryan Rasmussen
Henri Sivonen
Nathan Young