OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] [Summary, VERSION #2] Media type (MIME) of XML in MSWord?

[ Lists Home | Date Index | Thread Index ]

"xsl" referred to should of course be "xls" for excel...

Rick Marshall wrote:

> Hi Roger
>
> This is messy in the real world and I applaud your attempt to make 
> sense of it - but I despair of you being successful.
>
> Note that this is very system dependent and anyone trying to make 
> sense of it either goes gray or loses their hair. I have made some 
> more notes for you....
>
> Costello, Roger L. wrote:
>
>> Hi Folks,
>>
>> Many, many thanks for clearing up my mistakes. I have revised the 
>> summary based upon your comments. Please let me know of any remaining 
>> errors. /Roger
>>
>> *A Summary of XML and Media Types (MIME)*
>>
>> *What is XML’s MIME Type?*
>>
>> At this URL is a list of the 350 different MIME types:
>>
>> http://www.iana.org/assignments/media-types/
>>
>> In this list you will see two different MIME types for XML:
>>
>> *application/xml*
>>
>> * text/xml*
>>
>> The later MIME type (*text/xml*) has been deprecated. Thus, the 
>> official MIME type for XML is:
>>
>> *application/xml*
>>
>> Note the format for expressing MIME types – it contains two parts, 
>> separated by a slash:
>>
>> / *type*/*//subtype/*
>>
>> *What is Purpose of a MIME Type?*
>>
>> * *
>>
>> Suppose a browser, Web server, or other application is presented with 
>> a resource (document). The purpose of a MIME type is to give 
>> information to the browser, Web server, or other application about 
>> the format of the data contained within the resource.
>>
>> * *
>>
>> *Where Does a MIME Type Come From?*
>>
>> * *
>>
>> MIME types are metadata. A MIME type is not stored within a resource. 
>> It is not stored as a property of a resource. Heuristics are used for 
>> determining the MIME type of a resource. In other words, a system 
>> “guesses” what the MIME type is.
>>
> In the case of browsers they are told what the mime type is by the 
> http header. However in the case of one popular browser it chooses to 
> ignore the mime type in the http header and instead use the extension 
> of the dosument being retrieved. Which is really silly when the 
> "document" is a cgi script.
>
> So eg if I have a cgi script that is going to return html the 
> extension is unimportant (for some reason - matbe it's the default), 
> but if I want to return an xsl document then the cgi GET must be to a 
> script ending in ".cgi" - go figure.
>
> Generally in email and browsers the client application is told the 
> mime type - it does not guess, but as I said one large company tends 
> to ignore this. To be fair I think they have a security reason as 
> well, and it does address the issue of what to do if the MIME type 
> differs from the implied type from the extension.
>
> To see why this is so you have to understand some history. In Unix 
> (and now Linux) the dot extension is arbitrary. The file name is the 
> filename including the dot and by convention some dot extensions mean 
> things - linke .c, .o, etc. Most importantly executable files were 
> indicated by an executable bit in the i-node. In early microcomputer 
> os (file managers?) the dot was note stored and a simpler mechanism 
> was used to determine what to do with a file. executable binary files 
> ended in exe, batch files in bat etc. the file name was a name (fred) 
> plus a file type (the bit after the dot) (exe). The dot was 
> confusingly added when displaying the name, but not by all early os 
> (or even early dos for that matter).
>
> Now, here's the tricky bit - when MIME types came along they were 
> necessary for accurate determination of file types in typeless systems 
> like unix and the web, but superfluous to types file systems like the 
> microsoft products, and hence the conflict. I suspect MS can't fix 
> this without a fundamental change to their systems and for unix/linux 
> - well the system was designed to suit them so why should they change?
>
> So the importance of mime types is a function of your universe :)
>
>> On Windows the MIME type is guessed by using the file extension. In 
>> the Window’s Registry is a mapping from file extension to MIME type.
>>
>> Examples:
>>
>> - If a file ends with the extension .txt then the MIME type of the 
>> file is guessed to be *text/plain*.
>>
>> - If a file ends with the extension .doc then the MIME type of the 
>> file is guessed to be *application/msword*.
>>
>> - If a file ends with the extension .xml then the MIME type of the 
>> file is guessed to be *application/xml*.
>>
>> - If a file ends with the extension .zip then the MIME type of the 
>> file is guessed to be *application/zip*.
>>
>> *Can a MIME Type be Wrong?*
>>
>> * *
>>
>> Yes! You could create a *Word* document and deliberately give the 
>> document a false extension, such as “.xml”. Windows will then guess 
>> that the MIME type for the document is *application/xml*, which is 
>> clearly incorrect.
>>
>> *Why is MIME Type Important?*
>>
>> * *
>>
>> It would appear that MIME types are redundant. After all a file 
>> extension can tell you what kind of data a file contains, right? 
>> True. However, when you send a file across the Web, a file looses its 
>> file extension, only the contents are sent, not the filename (or file 
>> extension). This is where MIME becomes important. When data is sent 
>> across the Web, it is sent as the payload of an HTTP message. In the 
>> HTTP header is a field called Content-type, and the value of this 
>> field is a MIME type, /e.g.,/
>>
>> Content-type: *application/xml*
>>
>> Thus, when a Web server receives data it examines the Content-type 
>> header field to determine the type of data that is in the payload.
>>
>> *What is an XML Document?*
>>
>> Take this simple XML:
>>
>> *<?xml version="1.0"?>*
>>
>> *<root>*
>>
>> * Blah*
>>
>> *</root>*
>>
>> and put it into *Word* and give the document the extension “.xml”. As 
>> we’ve seen above the MIME type is:
>>
>> *application/xml*
>>
>> However, it is not an XML document. It is a *Word* document.
>>
>> Conversely, take the same XML and put it into *Notepad* and give the 
>> document the extension “.txt”. As we’ve seen the MIME type is:
>>
>> *text/plain *
>>
>> Yet, it is an XML document.
>>
>> So, what is an XML document? Answer: an XML document is one that has 
>> an XML declaration as the first thing in the file. If you open the 
>> above *Word* document you won’t find an XML declaration as the first 
>> thing. If you open the above *Notepad* document you will find an XML 
>> declaration as the first thing.
>>
>> *Further Information*
>>
>> Elliotte Rusty Harold has written an excellent article on this subject:
>>
>> http://www-128.ibm.com/developerworks/xml/library/x-mxd2.html
>>
>> *Acknowledgements*
>>
>> I would like to gratefully acknowledge the excellent inputs from 
>> these people:
>>
>> Mitch Amiano
>>
>> Rick Jelliffe
>>
>> Amelia Lewis
>>
>> Rick Marshall
>>
>> Dave Pawson
>>
>> Bryan Rasmussen
>>
>> Henri Sivonen
>>
>> Nathan Young
>>
>>  
>
>
>
>
> !DSPAM:448f4f44206747955413084!
>
>-----------------------------------------------------------------
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://www.oasis-open.org/mlmanage/index.php>
>
>!DSPAM:448f4f44206747955413084!
>  
>
begin:vcard
fn:Rick  Marshall
n:Marshall;Rick 
email;internet:rjm@zenucom.com
tel;cell:+61 411 287 530
x-mozilla-html:TRUE
version:2.1
end:vcard





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS