OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xml-dev] Text/xml with omitted charset parameter

* MURATA Makoto wrote:

(I am not addressing you personally...)

>>An improvement would have been to say only the XML declaration or text
>>declarations in external entities matter, other information is ignored
>>unless set explicitly, conforming processors most not assume anything,
>>rules are defined in XML 1.0, content providers should never ever use
>>text/xml for XML documents. XML improves only the situation on your
>>local hard drive, XML in MIME envoirements (read: XML on the Web)
>>deteriorates the situation and that's very disappointing.
>No matter what we write in specifications, people will use text/xml
>(even if we had not registered it).

"People" got somewhere the impression, they can use text/xml for XML
content. This somewhere is probably the XML 1.0 recommendation. If XML
1.0 provided only a application/xml media type and only application/xml
would have been registered, no one would use text/xml. People don't use
text/wml for WML documents, they use the registered text/vnd.wap.wml
media type. Why? No one ever gave them the impression, they can use
text/wml for WML documents.

>For example, at W3C, I have argued that SOAP must use application/xml
>rather than text/xml, but my comments have not been accepted yet.

To play once again advocatus diaboli, text/xml is appropriate for SOAP,
it is possible to get the general idea of a SOAP message without any
additional software, at least as possible as of all other XML documents.

>What we can do is to correctly document text/xml and application/xml, and 
>I believe RFC 3023 has done the job.

RFC 3023 lacks of a definition of the "casual user" it references in the
discussion, when to use text/xml or application/xml. MIME doesn't define
him either. So, who is supposed to be a casual user of a XML document?
text/xml documents must, by RFC 3023, be readable for him. I can think
of some text/plain document that contains a <x><![CDATA[ at the very
beginning and a ]]></x> at the end of the document with no more XML
involved. I consider this readable by casual users. If you add a certain
level of complexity, say XHTML elements, character references, dozens of
attributes, etc. it won't be readable by casual users, if I think of a
szenario with a web browser involved, that eventually renders the XHTML
document as text/plain. The majority of web browser users won't
understand all those funny characters.

Wait, it's readable if they are used to the latin alphabet, is that the
criterion for text/xml vs. application/xml? This would be contrary to
RFC 2046, it wants text/* types to be straight-forward enough, to get
the general idea of those characters. Well, I can even get the general
idea of various binary file formats but I will probably fail to get the
general idea of a text/plain document containing text written in some
language I don't understand. So where is that "general idea" thing
defined? It isn't. Interpretation is open from "Hey, that's *data*" to
"Aahh, this is an excerpt from Immanuel Kant's definition of
'Aufklärung', ...". So we have RFC 2046 that lacks of a definition of
'general idea' and a RFC 3023 that lacks of a definition of 'readable'
and 'casual user'. First, this is inconsistent, second, no, I don't know
a good definition for these terms. 'Readable' and 'general idea' make me
think of aliens who never used symbols for communication and
interaction, they will certainly not meet these criteria.

Sure, this is almost a theoretical discussion, the conclusion to draw is
simple, text/* does not work for anything but text/plain. If data
contains something that makes it different from text/plain it requires
an additional interpretation layer, hence it would be application/*,
otherwise you have to say, text/* is for everything that is expressed in
terms of characters, this renders application/xml useless.

I have currently to decide, what I will change in my JavaScript etc.
registration internet-draft and I am faced with the same problem. Do I
prefer application/...script or text/...script? I can get the general
idea of every script expressed in these language ("hey, it does some
scripting"), scripts are human-writable, thus human-readable, another
criterion satisfied. Humans can even understand what a script does,
without executing it, hence all RFC 2046 criteria are satisfied for a
text/* type. But wait, application/* types are meant to contain
information to be processed by an application, hey, scripts are. But
HTML, XML, CSS, RTF, SGML, rfc822-headers, WML, and almost all other
text/* types contain also information that is meant to be processed by
some application. Well, so what? text/* and application/* for all types?
Choose the one you like? Why is HTML text/* but XHTML application/*?
Possibly marketing? The successor of XHTML, let's call it YHTML, will
be text/* again?

I don't have answers and I fear no one has, hence MIME is a
misconception in this regard, so I strongly disagree here

>We will eventually learn (1) when to use text/* and when to use application/*, 

We cannot learn, we can just be wise and use always application/* for
I18N reasons until MIME is dead. Tim Berners-Lee recently suggested that
URIs should be used for Content-Types. This got almost ignored, since he
said it in a different context, and I was about to disagree, and no, I
never liked the concept of URIs that aren't URIs resp. resource locators
that don't locate resources, as in XML namespaces, but Tim is right,
MIME doesn't work with it's current Media Types, especially it doesn't
work for XML. We do not need better retrofitting into broken systems, we
need better systems. Go and develop them.

(please don't follow up to the list, unless it's related to xml-dev)

Thanks for your reading.
Björn Höhrmann { mailto:bjoern@hoehrmann.de } http://www.bjoernsworld.de
am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de
25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/