xml-dev - Re: [xml-dev] [Summary, VERSION #3] Media type (MIME) of XML in MSWord?

Re: [xml-dev] [Summary, VERSION #3] Media type (MIME) of XML in MSWord?

[ Lists Home | Date Index | Thread Index ]

To: "Costello, Roger L." <costello@mitre.org>
Subject: Re: [xml-dev] [Summary, VERSION #3] Media type (MIME) of XML in MSWord? in Notepad? when compressed? etc
From: Rick Marshall <rjm@zenucom.com>
Date: Thu, 15 Jun 2006 10:33:56 +1000
Cc: xml-dev@lists.xml.org
In-reply-to: <B8415163A689094689542C617ECA0366CCF142@IMCSRV5.MITRE.ORG>
Organization: Zenucom Pty Ltd
References: <B8415163A689094689542C617ECA0366CCF142@IMCSRV5.MITRE.ORG>
User-agent: Mozilla Thunderbird 1.0.8-1.1.fc4 (X11/20060501)

Nearly Roger. I hope this ends up somewhere useful like wikipaedia.....

Having spent some time following this perma-thread - it seems to me that 
if I was a designing an OS from scratch I would ditch the file 
extensions (in MS, VMS, etc) and use MIME types mapped to OS functions 
as part of the i-node or directory structure.

Going one step further that could really make XML useful....

Regards

Rick

Costello, Roger L. wrote:

> Hi Folks,
>
> This discussion is excellent! Thank you very much for your outstanding 
> comments. Here’s version #3. Please let me know of any remaining 
> inaccuracies. /Roger
>
> *A Summary of XML and Media Types (MIME)*
>
> *XML Data is Assigned what MIME Type?*
>
> At this URL is a list of the 350 different MIME types:
>
> http://www.iana.org/assignments/media-types/
>
> In this list you will see that there are two MIME type assignments to 
> XML data:
>
> *application/xml*
>
> * text/xml*
>
> The later MIME type (*text/xml*) has been deprecated. Thus, the 
> official MIME type assignment to XML data is:
>
> *application/xml*
>
> Note the format for expressing MIME types – it contains two parts, 
> separated by a slash:
>
> / *type*/*//subtype/** *
>
> * *
>
> *Where are MIME Types Used?*
>
> * *
>
> MIME types are used in exchanges of data on the Web. For example, 
> suppose you send some data to a Web server. The data arrives at the 
> Web server as just a stream of bytes. The data has no filename or file 
> extension (/e.g.,/ “.xml”). How does the Web server know that the 
> bytes represent XML data and not some other kind of data?
>
> Answer: when data is sent across the Web, it is sent as the payload of 
> an HTTP message. In the HTTP header is a field called Content-type, 
> and the value of this field is a MIME type, /e.g.,/
>
> Content-type: *application/xml*
>
> Thus, when the Web server receives the stream of bytes it examines the 
> Content-type header field to determine the type of data.
>
> Going the other direction, when a Web server /sends out/ data it 
> assigns a MIME type to the transmitted data in the same fashion – 
> assigning a MIME type to the data via the Content-type header.
>
> When a /Web browser/ receives data (more precisely, receives a stream 
> of bytes) from the Web, it looks at the MIME type in Content-type to 
> determine how to render the data.
>
That's what it should do. That's not what IE does. IE "remembers" what 
it asked for (at least in a "GET") and assumes that the returned data 
conforms to the "extension" used in the GET. No extension => html, .pdf 
=> pdf file etc.

As far as I can tell it does not look at the mime type at all. Mozilla 
etc work as described. This is one of every web developers nightmares.

> To recap, MIME types were created for networks. MIME types inform 
> network applications (/e.g.,/ Web servers and browsers) of the type of 
> data.
>
> *XML doesn’t “have” a MIME Type!*
>
> * *
>
> To say that XML “has” a MIME type of *application/xml* is misleading. 
> It suggests that somehow a MIME type is part of, or a property of, an 
> XML document. It is not. An XML document does not have a MIME 
> property. A stream of bytes may be “assigned” the MIME type 
> *application/xml*.
>
> A MIME type is an externally applied label. It is pure metadata. It is 
> similar in many ways to a file type extension (/e.g.,/ “.xml”, “.txt”) 
> but not as persistent.
>
> *What is the Purpose of a MIME Type?*
>
> * *
>
> The purpose of a MIME type is to provide network applications such as 
> Web browsers and Web servers information about the type of data it has 
> received.
>
> * *
>
> *How does Data get Assigned a MIME Type?*
>
> * *
>
> Suppose a Web server is invoked, and suppose the data that it is to 
> return is located in a file on its local file system. The Web server 
> reads in the file’s data and constructs an HTTP message. How does it 
> assign an appropriate MIME type to the data? (/i.e.,/ how does it fill 
> in the Content-type field?)
>
> Answer: heuristics are used for determining the MIME type of a 
> resource. In other words, the Web server “guesses” what the MIME type 
> is. The heuristics used depend on the operating system (platform).
>
I think this is correct only for files. Don't forget web servers from 
day one have support remote execution. If the web server is asked to run 
a cgi script then it is the responsibility of the cgi script to tell the 
browser what it is sending back (except in the case of IE where the 
script must have name conforming to the data type being returned). The 
first thing a script does is output 2 lines:

Content-type: a/b
(blank line)

That's not a guess, that's a deliberate decision by a person writing the 
script

> On Windows the MIME type is guessed from the file extension. In the 
> Window’s Registry is a mapping from file extension to MIME type.
>
> Examples:
>
> - If a file ends with the extension .txt then the MIME type is guessed 
> to be *text/plain*.
>
> - If a file ends with the extension .doc then the MIME type is guessed 
> to be *application/msword*.
>
> - If a file ends with the extension .xml then the MIME type is guessed 
> to be *application/xml*.
>
> - If a file ends with the extension .zip then the MIME type is guessed 
> to be *application/zip*.
>
> On Unix and Linux systems the MIME type is not determined based on 
> file extension, it is determined by a variety of heuristics.
>
Unix and Linux systems don't have a concept of file extension - it's a 
convention - hence the heuristics. I know I'm being pedantic here, but 
it is a critical concept.

> Thus we see that MIME type provides information about the format of 
> data, independent of platform!
>
> *Can Data be Assigned a Wrong MIME Type?*
>
> * *
>
> Yes! You could create a *Word* document and deliberately give the 
> document a false extension, such as “.xml”. If the Web server is 
> running on a Windows machine then the Web server will assign to the 
> data a MIME type of *application/xml*, which is clearly incorrect.
>
> *Browsers use MIME Type to Decide how to Render a Resource*
>
> * *
>
> With the exception of *Internet Explorer*, browsers use MIME type to 
> decide how to render a resource. If the resource is retrieved from the 
> Web then the MIME type is found in the HTTP header. If the resource 
> comes from a file on the local file system then the MIME type is 
> guessed using heuristics, as described above.
>
> *Internet Explorer* uses the MIME type found in the HTTP header to 
> determine how to render a resource retrieved from the Web. But for 
> resources that come from the local file system, *Internet Explorer* 
> uses the file extension to determine how to render the resource.
>
> *What is an XML Document?*
>
> Take this simple XML:
>
> *<?xml version="1.0"?>*
>
> *<root>*
>
> * Blah*
>
> *</root>*
>
> and put it into *Word* and deliberately give the document the 
> incorrect extension “.xml”. Name the file “Blah.xml”. Now make this 
> file Web-accessible. Next, suppose someone on the Web requests the 
> file by issuing its URL, /e.g.,/
>
> http://www.example.org/Blah.xml
>
> The Web server will read in the data from the file, and assign a MIME 
> type to it. Let’s assume the Web server is running on a Windows 
> machine; then the Web server will use the file extension to assign a 
> MIME type. Here’s the MIME type it will assign to the data:
>
> *application/xml*
>
> However, the data is not XML. It is *Word*.
>
> Conversely, take the same XML and put it into *Notepad* and give the 
> document the extension “.txt”. Name the file “Blah.txt”. Make this 
> file also Web-accessible. Next, suppose someone on the Web requests 
> the file by issuing its URL, /e.g.,/
>
> http://www.example.org/Blah.txt
>
> The Web server will read in the data from the file, and assign a MIME 
> type to it. Let’s assume the Web server is running on a Windows 
> machine; then the Web server will use the file extension to assign a 
> MIME type. Here’s the MIME type it will assign to the data:
>
> *text/plain *
>
> Yet, it is an XML document.
>
> So, what is an XML document?
>
> Answer: an XML document is one that may or may not have an XML 
> declaration and is followed by a root element, with well-formed 
> content. If you open the above *Word* document you won’t find an XML 
> declaration or a root element. If you open the above *Notepad* 
> document you will find an XML declaration as the first thing.
>
> *Further Information*
>
> Elliotte Rusty Harold has written an excellent article on this subject:
>
> http://www-128.ibm.com/developerworks/xml/library/x-mxd2.html
>
> *Acknowledgements*
>
> I would like to gratefully acknowledge the excellent inputs from these 
> people:
>
> Mitch Amiano
>
> Elliotte Rusty Harold
>
> Rick Jelliffe
>
> Amelia Lewis
>
> Frank Manola
>
> Rick Marshall
>
> Dave Pawson
>
> Bryan Rasmussen
>
> Henri Sivonen
>
> Nathan Young
>
> !DSPAM:44909a7578691804284693!

begin:vcard
fn:Rick  Marshall
n:Marshall;Rick 
email;internet:rjm@zenucom.com
tel;cell:+61 411 287 530
x-mozilla-html:TRUE
version:2.1
end:vcard

Follow-Ups:
- RE: [xml-dev] [Summary, VERSION #3] Media type (MIME) of XML in MS Word? in Notepad? when compressed? etc
  - From: "Michael Kay" <mike@saxonica.com>

References:
- [Summary, VERSION #3] Media type (MIME) of XML in MS Word? in Notepad? when compressed? etc
  - From: "Costello, Roger L." <costello@mitre.org>

Prev by Date: [Summary, VERSION #3] Media type (MIME) of XML in MS Word? in Notepad? when compressed? etc
Next by Date: RE: [xml-dev] [Summary, VERSION #3] Media type (MIME) of XML in MS Word? in Notepad? when compressed? etc
Previous by thread: [Summary, VERSION #3] Media type (MIME) of XML in MS Word? in Notepad? when compressed? etc
Next by thread: RE: [xml-dev] [Summary, VERSION #3] Media type (MIME) of XML in MS Word? in Notepad? when compressed? etc
Index(es):
- Date
- Thread