xml-dev - [Summary, VERSION #3] Media type (MIME) of XML in MS Word? in Notepad? w

[Summary, VERSION #3] Media type (MIME) of XML in MS Word? in Notepad? w

[ Lists Home | Date Index | Thread Index ]

To: <xml-dev@lists.xml.org>
Subject: [Summary, VERSION #3] Media type (MIME) of XML in MS Word? in Notepad? when compressed? etc
From: "Costello, Roger L." <costello@mitre.org>
Date: Wed, 14 Jun 2006 19:31:18 -0400
Thread-index: AcaQCqLZfjCkr2oaQVKEGuMJEaosGg==
Thread-topic: [Summary, VERSION #3] Media type (MIME) of XML in MS Word? in Notepad? when compressed? etc

Hi Folks,

This discussion is excellent! Thank you very much for your outstanding comments. Here’s version #3. Please let me know of any remaining inaccuracies. /Roger

A Summary of XML and Media Types (MIME)

XML Data is Assigned what MIME Type?

At this URL is a list of the 350 different MIME types:

http://www.iana.org/assignments/media-types/

In this list you will see that there are two MIME type assignments to XML data:

application/xml

text/xml

The later MIME type (text/xml) has been deprecated. Thus, the official MIME type assignment to XML data is:

application/xml

Note the format for expressing MIME types – it contains two parts, separated by a slash:

type/subtype

Where are MIME Types Used?

MIME types are used in exchanges of data on the Web. For example, suppose you send some data to a Web server. The data arrives at the Web server as just a stream of bytes. The data has no filename or file extension (e.g., “.xml”). How does the Web server know that the bytes represent XML data and not some other kind of data?

Answer: when data is sent across the Web, it is sent as the payload of an HTTP message. In the HTTP header is a field called Content-type, and the value of this field is a MIME type, e.g.,

Content-type: application/xml

Thus, when the Web server receives the stream of bytes it examines the Content-type header field to determine the type of data.

Going the other direction, when a Web server sends out data it assigns a MIME type to the transmitted data in the same fashion – assigning a MIME type to the data via the Content-type header.

When a Web browser receives data (more precisely, receives a stream of bytes) from the Web, it looks at the MIME type in Content-type to determine how to render the data.

To recap, MIME types were created for networks. MIME types inform network applications (e.g., Web servers and browsers) of the type of data.

XML doesn’t “have” a MIME Type!

To say that XML “has” a MIME type of application/xml is misleading. It suggests that somehow a MIME type is part of, or a property of, an XML document. It is not. An XML document does not have a MIME property. A stream of bytes may be “assigned” the MIME type application/xml.

A MIME type is an externally applied label. It is pure metadata. It is similar in many ways to a file type extension (e.g., “.xml”, “.txt”) but not as persistent.

What is the Purpose of a MIME Type?

The purpose of a MIME type is to provide network applications such as Web browsers and Web servers information about the type of data it has received.

How does Data get Assigned a MIME Type?

Suppose a Web server is invoked, and suppose the data that it is to return is located in a file on its local file system. The Web server reads in the file’s data and constructs an HTTP message. How does it assign an appropriate MIME type to the data? (i.e., how does it fill in the Content-type field?)

Answer: heuristics are used for determining the MIME type of a resource. In other words, the Web server “guesses” what the MIME type is. The heuristics used depend on the operating system (platform).

On Windows the MIME type is guessed from the file extension. In the Window’s Registry is a mapping from file extension to MIME type.

Examples:

- If a file ends with the extension .txt then the MIME type is guessed to be text/plain.

- If a file ends with the extension .doc then the MIME type is guessed to be application/msword.

- If a file ends with the extension .xml then the MIME type is guessed to be application/xml.

- If a file ends with the extension .zip then the MIME type is guessed to be application/zip.

On Unix and Linux systems the MIME type is not determined based on file extension, it is determined by a variety of heuristics.

Thus we see that MIME type provides information about the format of data, independent of platform!

Can Data be Assigned a Wrong MIME Type?

Yes! You could create a Word document and deliberately give the document a false extension, such as “.xml”. If the Web server is running on a Windows machine then the Web server will assign to the data a MIME type of application/xml, which is clearly incorrect.

Browsers use MIME Type to Decide how to Render a Resource

With the exception of Internet Explorer, browsers use MIME type to decide how to render a resource. If the resource is retrieved from the Web then the MIME type is found in the HTTP header. If the resource comes from a file on the local file system then the MIME type is guessed using heuristics, as described above.

Internet Explorer uses the MIME type found in the HTTP header to determine how to render a resource retrieved from the Web. But for resources that come from the local file system, Internet Explorer uses the file extension to determine how to render the resource.

What is an XML Document?

Take this simple XML:

<?xml version="1.0"?>

<root>

Blah

</root>

and put it into Word and deliberately give the document the incorrect extension “.xml”. Name the file “Blah.xml”. Now make this file Web-accessible. Next, suppose someone on the Web requests the file by issuing its URL, e.g.,

http://www.example.org/Blah.xml

The Web server will read in the data from the file, and assign a MIME type to it. Let’s assume the Web server is running on a Windows machine; then the Web server will use the file extension to assign a MIME type. Here’s the MIME type it will assign to the data:

application/xml

However, the data is not XML. It is Word.

Conversely, take the same XML and put it into Notepad and give the document the extension “.txt”. Name the file “Blah.txt”. Make this file also Web-accessible. Next, suppose someone on the Web requests the file by issuing its URL, e.g.,

http://www.example.org/Blah.txt

text/plain

Yet, it is an XML document.

So, what is an XML document?

Answer: an XML document is one that may or may not have an XML declaration and is followed by a root element, with well-formed content. If you open the above Word document you won’t find an XML declaration or a root element. If you open the above Notepad document you will find an XML declaration as the first thing.

Further Information

Elliotte Rusty Harold has written an excellent article on this subject:

http://www-128.ibm.com/developerworks/xml/library/x-mxd2.html

Acknowledgements

I would like to gratefully acknowledge the excellent inputs from these people:

Mitch Amiano

Elliotte Rusty Harold

Rick Jelliffe

Amelia Lewis

Frank Manola

Rick Marshall

Dave Pawson

Bryan Rasmussen

Henri Sivonen

Nathan Young

Follow-Ups:
- Re: [xml-dev] [Summary, VERSION #3] Media type (MIME) of XML in MS Word? in Notepad? when compressed? etc
  - From: David Carlisle <davidc@nag.co.uk>
- Re: [xml-dev] [Summary, VERSION #3] Media type (MIME) of XML in MSWord? in Notepad? when compressed? etc
  - From: Rick Marshall <rjm@zenucom.com>

Prev by Date: New Proposed Edited Recommendations
Next by Date: Re: [xml-dev] [Summary, VERSION #3] Media type (MIME) of XML in MSWord? in Notepad? when compressed? etc
Previous by thread: New Proposed Edited Recommendations
Next by thread: Re: [xml-dev] [Summary, VERSION #3] Media type (MIME) of XML in MSWord? in Notepad? when compressed? etc
Index(es):
- Date
- Thread