OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] [Summary, VERSION #2] Media type (MIME) of XML in MSWord?

[ Lists Home | Date Index | Thread Index ]

Hi Roger

This is messy in the real world and I applaud your attempt to make sense 
of it - but I despair of you being successful.

Note that this is very system dependent and anyone trying to make sense 
of it either goes gray or loses their hair. I have made some more notes 
for you....

Costello, Roger L. wrote:

> Hi Folks,
>
> Many, many thanks for clearing up my mistakes. I have revised the 
> summary based upon your comments. Please let me know of any remaining 
> errors. /Roger
>
> *A Summary of XML and Media Types (MIME)*
>
> *What is XML’s MIME Type?*
>
> At this URL is a list of the 350 different MIME types:
>
> http://www.iana.org/assignments/media-types/
>
> In this list you will see two different MIME types for XML:
>
> *application/xml*
>
> * text/xml*
>
> The later MIME type (*text/xml*) has been deprecated. Thus, the 
> official MIME type for XML is:
>
> *application/xml*
>
> Note the format for expressing MIME types – it contains two parts, 
> separated by a slash:
>
> / *type*/*//subtype/*
>
> *What is Purpose of a MIME Type?*
>
> * *
>
> Suppose a browser, Web server, or other application is presented with 
> a resource (document). The purpose of a MIME type is to give 
> information to the browser, Web server, or other application about the 
> format of the data contained within the resource.
>
> * *
>
> *Where Does a MIME Type Come From?*
>
> * *
>
> MIME types are metadata. A MIME type is not stored within a resource. 
> It is not stored as a property of a resource. Heuristics are used for 
> determining the MIME type of a resource. In other words, a system 
> “guesses” what the MIME type is.
>
In the case of browsers they are told what the mime type is by the http 
header. However in the case of one popular browser it chooses to ignore 
the mime type in the http header and instead use the extension of the 
dosument being retrieved. Which is really silly when the "document" is a 
cgi script.

So eg if I have a cgi script that is going to return html the extension 
is unimportant (for some reason - matbe it's the default), but if I want 
to return an xsl document then the cgi GET must be to a script ending in 
".cgi" - go figure.

Generally in email and browsers the client application is told the mime 
type - it does not guess, but as I said one large company tends to 
ignore this. To be fair I think they have a security reason as well, and 
it does address the issue of what to do if the MIME type differs from 
the implied type from the extension.

To see why this is so you have to understand some history. In Unix (and 
now Linux) the dot extension is arbitrary. The file name is the filename 
including the dot and by convention some dot extensions mean things - 
linke .c, .o, etc. Most importantly executable files were indicated by 
an executable bit in the i-node. In early microcomputer os (file 
managers?) the dot was note stored and a simpler mechanism was used to 
determine what to do with a file. executable binary files ended in exe, 
batch files in bat etc. the file name was a name (fred) plus a file type 
(the bit after the dot) (exe). The dot was confusingly added when 
displaying the name, but not by all early os (or even early dos for that 
matter).

Now, here's the tricky bit - when MIME types came along they were 
necessary for accurate determination of file types in typeless systems 
like unix and the web, but superfluous to types file systems like the 
microsoft products, and hence the conflict. I suspect MS can't fix this 
without a fundamental change to their systems and for unix/linux - well 
the system was designed to suit them so why should they change?

So the importance of mime types is a function of your universe :)

> On Windows the MIME type is guessed by using the file extension. In 
> the Window’s Registry is a mapping from file extension to MIME type.
>
> Examples:
>
> - If a file ends with the extension .txt then the MIME type of the 
> file is guessed to be *text/plain*.
>
> - If a file ends with the extension .doc then the MIME type of the 
> file is guessed to be *application/msword*.
>
> - If a file ends with the extension .xml then the MIME type of the 
> file is guessed to be *application/xml*.
>
> - If a file ends with the extension .zip then the MIME type of the 
> file is guessed to be *application/zip*.
>
> *Can a MIME Type be Wrong?*
>
> * *
>
> Yes! You could create a *Word* document and deliberately give the 
> document a false extension, such as “.xml”. Windows will then guess 
> that the MIME type for the document is *application/xml*, which is 
> clearly incorrect.
>
> *Why is MIME Type Important?*
>
> * *
>
> It would appear that MIME types are redundant. After all a file 
> extension can tell you what kind of data a file contains, right? True. 
> However, when you send a file across the Web, a file looses its file 
> extension, only the contents are sent, not the filename (or file 
> extension). This is where MIME becomes important. When data is sent 
> across the Web, it is sent as the payload of an HTTP message. In the 
> HTTP header is a field called Content-type, and the value of this 
> field is a MIME type, /e.g.,/
>
> Content-type: *application/xml*
>
> Thus, when a Web server receives data it examines the Content-type 
> header field to determine the type of data that is in the payload.
>
> *What is an XML Document?*
>
> Take this simple XML:
>
> *<?xml version="1.0"?>*
>
> *<root>*
>
> * Blah*
>
> *</root>*
>
> and put it into *Word* and give the document the extension “.xml”. As 
> we’ve seen above the MIME type is:
>
> *application/xml*
>
> However, it is not an XML document. It is a *Word* document.
>
> Conversely, take the same XML and put it into *Notepad* and give the 
> document the extension “.txt”. As we’ve seen the MIME type is:
>
> *text/plain *
>
> Yet, it is an XML document.
>
> So, what is an XML document? Answer: an XML document is one that has 
> an XML declaration as the first thing in the file. If you open the 
> above *Word* document you won’t find an XML declaration as the first 
> thing. If you open the above *Notepad* document you will find an XML 
> declaration as the first thing.
>
> *Further Information*
>
> Elliotte Rusty Harold has written an excellent article on this subject:
>
> http://www-128.ibm.com/developerworks/xml/library/x-mxd2.html
>
> *Acknowledgements*
>
> I would like to gratefully acknowledge the excellent inputs from these 
> people:
>
> Mitch Amiano
>
> Rick Jelliffe
>
> Amelia Lewis
>
> Rick Marshall
>
> Dave Pawson
>
> Bryan Rasmussen
>
> Henri Sivonen
>
> Nathan Young
>
> !DSPAM:448f4806196111804284693! 

begin:vcard
fn:Rick  Marshall
n:Marshall;Rick 
email;internet:rjm@zenucom.com
tel;cell:+61 411 287 530
x-mozilla-html:TRUE
version:2.1
end:vcard





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS