xml-dev - [Final] Media type (MIME) of XML in MS Word? in Notepad? when compressed

[Final] Media type (MIME) of XML in MS Word? in Notepad? when compressed

[ Lists Home | Date Index | Thread Index ]

To: <xml-dev@lists.xml.org>
Subject: [Final] Media type (MIME) of XML in MS Word? in Notepad? when compressed? etc
From: "Costello, Roger L." <costello@mitre.org>
Date: Thu, 15 Jun 2006 19:29:01 -0400
Thread-index: AcaQ03t1D+7iaOb1TqqGogFo378kZw==
Thread-topic: [Final] Media type (MIME) of XML in MS Word? in Notepad? when compressed? etc

Hi Folks,

Many thanks to all those who participated. This has been an outstanding discussion. We have arrived at a really good collection of information.

[Rick, your idea of adding this information to Wikipedia is excellent. Would you do the honors?]

The writeup below has two additions to the Summary #3 version:

1. In the first section I added a brief discussion on MIME types of data that use a standard XML vocabulary (e.g., xsl, svg, xhtml).

2. I added a new section titled: Beyond MIME Types?

In addition I made several small editing fixes.

Thanks again everyone! /Roger

A Summary of XML and Media Types (MIME)

XML Data is Assigned what MIME Type?

At this URL is a list of over 350 different MIME types:

http://www.iana.org/assignments/media-types/

In the list you will see that there are two MIME type assignments to XML data:

application/xml

text/xml

The later MIME type (text/xml) has been deprecated. Thus, the official MIME type assignment to XML data is:

application/xml

Note the format for expressing MIME types – it contains two parts, separated by a slash:

type/subtype

The MIME type list also has MIME type assignments to data that uses a standard XML vocabulary. For example, this MIME type:

application/xsl + xml

is assigned to data that uses the XSL vocabulary. And this MIME type:

image/svg + xml

is assigned to data that uses the SVG vocabulary.

Note the format for expressing MIME types of data that uses a specific XML vocabulary:

type/___ + xml

where ___ is replaced by the name of an XML vocabulary (xsl, svg, xhtml, etc.).

Where are MIME Types Used?

MIME types are used in exchanges of data on the Web. For example, suppose you send some data to a Web server. The data arrives at the Web server as just a stream of bytes. The data has no filename or file extension (e.g., “.xml”). How does the Web server know that the bytes represent XML data and not some other kind of data?

Answer: when data is sent across the Web, it is sent as the payload of an HTTP message. In the HTTP header is a field called Content-type, and the value of this field is a MIME type, e.g.,

Content-type: application/xml

Thus, when the Web server receives the stream of bytes it examines the Content-type header field to determine the type of data.

Going the other direction, when a Web server sends out data it assigns a MIME type to the transmitted data in the same fashion – assigning a MIME type to the data via the Content-type header.

When a Web browser receives data (more precisely, receives a stream of bytes) from the Web, it looks at the MIME type in Content-type to determine how to render the data.

To recap, MIME types were created for networks. MIME types inform network applications (e.g., Web servers and browsers) of the type of data.

XML doesn’t “have” a MIME Type!

To say that XML “has” a MIME type of application/xml is misleading. It suggests that somehow a MIME type is part of, or a property of, an XML document. It is not. An XML document does not have a MIME property. A stream of bytes may be “assigned” the MIME type application/xml.

A MIME type is an externally applied label. It is pure metadata. It is similar in many ways to a file type extension (e.g., “.xml”, “.txt”) but not as persistent.

What is the Purpose of a MIME Type?

The purpose of a MIME type is to provide network applications such as Web browsers and Web servers information about the type of data it has received. With information about the type of data it has received, a Web browser or Web server can then trigger an appropriate program to handle the data.

How does Data get Assigned a MIME Type?

Suppose a Web server is invoked, and suppose the data that it is to return is located in a file on its local file system. The Web server reads in the file’s data and constructs an HTTP message. How does it assign an appropriate MIME type to the data? (i.e., how does it fill in the Content-type field?)

Answer: heuristics are used for determining the MIME type of a resource. In other words, the Web server “guesses” what the MIME type is. The heuristics used depend on the operating system (platform).

On Windows the MIME type is guessed from the file extension. In the Window’s Registry is a mapping from file extension to MIME type.

Examples:

- If a file ends with the extension .txt then the MIME type is guessed to be text/plain.

- If a file ends with the extension .doc then the MIME type is guessed to be application/msword.

- If a file ends with the extension .xml then the MIME type is guessed to be application/xml.

- If a file ends with the extension .zip then the MIME type is guessed to be application/zip.

On Unix and Linux systems the MIME type is not determined based on file extension, it is determined by a variety of heuristics.

Thus we see that MIME type provides information about the format of data, independent of platform!

Can Data be Assigned a Wrong MIME Type?

Yes! You could create a Word document and deliberately give the document a false extension, such as “.xml”. If the Web server is running on a Windows machine then the Web server will assign to the data a MIME type of application/xml, which is clearly incorrect.

Browsers use MIME Type to Decide how to Render a Resource

With the exception of Internet Explorer, browsers use MIME type to decide how to render a resource. If the resource is retrieved from the Web then the MIME type is found in the HTTP header. If the resource comes from a file on the local file system then the MIME type is guessed using heuristics, as described above.

Internet Explorer uses the MIME type found in the HTTP header to determine how to render a resource retrieved from the Web. But for resources that come from the local file system, Internet Explorer uses the file extension to determine how to render the resource.

Beyond MIME Types?

MIME types may be criticized for not capturing some important metadata. For example, suppose that a Web server receives some data, and in the Content-type header it shows this MIME type:

Content-type: application/msword

The MIME type tells the Web server that the data is Word data. But which version of Word is the data? The MIME type doesn’t tell.

What is an XML Document?

Take this simple XML:

<?xml version="1.0"?>

<root>

Blah

</root>

and put it into Word and deliberately give the document the incorrect extension “.xml”. Name the file “Blah.xml”. Now make this file Web-accessible. Next, suppose someone on the Web requests the file by issuing its URL, e.g.,

http://www.example.org/Blah.xml

The Web server will read in the data from the file, and assign a MIME type to it. Let’s assume the Web server is running on a Windows machine; then the Web server will use the file extension to assign a MIME type. Here’s the MIME type it will assign to the data:

application/xml

However, the data is not XML. It is Word.

Conversely, take the same XML and put it into Notepad and give the document the extension “.txt”. Name the file “Blah.txt”. Make this file also Web-accessible. Next, suppose someone on the Web requests the file by issuing its URL, e.g.,

http://www.example.org/Blah.txt

text/plain

Yet, it is an XML document.

So, what is an XML document?

Answer: an XML document is one that may or may not have an XML declaration and is followed by a root element, with well-formed content. If you open the above Word document you won’t find an XML declaration or a root element. If you open the above Notepad document you will find an XML declaration as the first thing.

Further Information

Elliotte Rusty Harold has written an excellent article on this subject:

http://www-128.ibm.com/developerworks/xml/library/x-mxd2.html

Acknowledgements

I would like to gratefully acknowledge the excellent inputs from these people:

Mitch Amiano

David Carlisle

Elliotte Rusty Harold

Bob Irving

Rick Jelliffe

Michael Kay

Amelia Lewis