[
Lists Home |
Date Index |
Thread Index
]
On 2006-06-12 18:47:33 -0400 "Costello, Roger L." <costello@mitre.org> wrote:
> The Editor used to Create the XML Determines its MIME Type
Gah! No! No, no, no!
> Interestingly, you may have a document which contains XML and yet its
> MIME type may not be application/xml.
>
> For example, take this simple XML:
>
> <?xml version="1.0"?>
>
> <root>
> Blah
> </root>
>
> and put it into Word (save it as a .doc file). The MIME type is:
>
> application/msword
True as far as it goes, but that's because it's *not XML!*
Try this experiment.
Type the above in Word.
Save as .doc (default).
Open a DOS box (or whatever they call it these days) and say "type
NameOfDocument.doc".
Does it *look* like XML? No. It violates the rules for XML, namely that
the XML declaration *must* be the first thing encountered in the data.
Do this with any random proprietary-format tool you care to; same result.
The fact that you are "quoting" an XML document inside some other document
format does *not* make that format somehow magically become XML. It's still
what it is.
Put that XML document into a cell in an Excel spreadsheet. Do you *really*
expect the .xsl that you saved to be "XML"?
Here's something fun. Type that stuff into OpenOffice.org's word processor
and save in default format. Is the result XML? Well, no. It's a
compressed (.zip) directory. Unzip it, and what's inside? Hey, there's
XML! Only ... no, it isn't the XML you *typed*. All that has been quoted
(escaped) into CDATA. But it *is* XML, only it's a different document type.
> Conversely, if you put the same XML into Notepad, the MIME type is:
>
> application/xml
Bloody not if you accept the "suggestion" of notepad that it ought to have a
".txt" extension. Then it's text/plain.
> Why is that? Why is it that if you put XML into one editor (Word) you
> get a MIME type that is specific to the editor, whereas if you put XML
> into another editor (Notepad) you get a MIME type that is independent
> of the editor?
Well, because you don't?
The "MIME type" of a document is not stored in a document. A variety of
heuristics may be applied to dynamically determine the MIME type; this was
true of document formats even before MIME (see file(1)). The commonest
heuristic is the "extension", the bit that comes after the last dot in a
filename, typically 1-4 characters (in the DOS world, always three
characters). In that heuristic, .doc maps to application/msword (even if
it's *actually* an Excel spreadsheet), and .txt maps to text/plain (even if
it's *really* a pkzip-compressed encrypted security analysis in a
proprietary format) and .xml maps (as a rule) to application/xml.
What happens to a Windows application if the MIME type doesn't match the
extension?
Damn all. Windows doesn't care about MIME types.
What happens to a Windows application if the data format doesn't match the
extension?
Crash. Hopefully, the application just refuses to read it, but it could
crash, and given the general level of protection in the system, it could
bring the system down.
What happens to a BeOS application if the MIME type doesn't match the
extension or data format?
BeOS used MIME types in the file system, and preferred to trust them rather
than extensions. A decent application should have degraded gracefully. In
worst case, see above ("bring the system down").
Critically: a MIME type is *metadata*, it is a label placed on the data, it
is not inherent in the data. Data does not "have" a MIME type, it is
*assigned* a MIME type (or not, if it isn't relevant, as for most
applications running on Windows). Windows cares about "file types"
(extensions), not MIME types. Web servers typically care about MIME types
(although HTTP isn't a MIME-compliant protocol, but that's a different
rant). Browsers, consequently, usually care about MIME types.
I can write the above document in Word (ewwww, and wash my hands after), and
save it as a .doc, and then instruct my webserver to deliver it as
application/xml regardless of the extension, and a browser that receives it
... will choke, because it *isn't XML*. The webserver, not being Word,
can't strip the cruft; the web browser, not being Word, gets confused when
handed application/XML that doesn't start with an XML declaration.
> The answer is this: when the XML is put into Word, the Word application
> wraps the XML with a bunch of Word-specific stuff (the wrapper stuff is
> not visible).
Oh, *yes* it is! Unless, of course, you happen to be using one application,
namely MS Word.
> Conversely, Notepad does not wrap the XML with anything. The document
> is pure XML, it can be fed directly into an XML parser, and thus it has
> a MIME type of application/xml.
No it isn't. It's whatever MIME type you assign to it. If you call it
text/plain, it's text/plaini.
Amy!
(in a ranting mood ... but the summary was misleading, I'm sorry)
--
Amelia A. Lewis amyzing {at} talsever.com
There's someone in my head, but it's not me.
-- Pink Floyd
|