OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   text/xml vs. application/xml

[ Lists Home | Date Index | Thread Index ]
  • From: David Megginson <ak117@freenet.carleton.ca>
  • To: xml-dev Mailing List <xml-dev@ic.ac.uk>
  • Date: Sat, 20 Dec 1997 13:29:28 -0500

MURATA Makoto writes:

 > >  http://www.microstar.com/XML/donne.xml
 > >  http://home.sprynet.com/sprynet/dmeggins/texts/darkness/darkness.xml

 > As a co-editor of an (upcoming) RFC for text/xml and
 > application/xml, I think that I should point out the correct
 > procedure for encoding determination.  (I have not checked these
 > two Web sites, and ∆lfred.)

Thank you very much for the information.  Currently, both of these web
servers return "application/octet-stream" as the MIME type for *.xml
and *.dtd files: in this case, is it correct for an XML parser to fall
back on other character-encoding detection techniques, as ∆lfred does?

 > For those XML documents transmitted by the HTTP protocol, XML parsers 
 > should use the charset parameter of the media type text/xml  (BTW, 
 > the default of this parameter is 8859-1).  XML parsers should ignore
 > the encoding declaration within XML documents transmitted by HTTP.  
 > More about this, see the XML PR and the HTTP/1.1

I have two important queries:

1) Are you certain that ignoring the encoding declaration is
   conforming behaviour?  It seems to me that it would make more sense
   to report an error if the charset parameter and the encoding
   declaration differ (especially since the PR requires any document
   without a BOM or encoding declaration to be in UTF-8).

2) Why pick a default encoding that conforming XML parsers are not
   required to support?  ∆lfred does accept encoding="ISO-8859-1", but
   some other parsers do not.  It seems to me that either the RFC or
   the PR needs to be amended.

I can also anticipate a different problem: few private people (as
opposed to companies or organisations) have any control at all over
what their HTTP servers send out.  

Imagine an exchange student at a big American University, who wants to
publish a UTF-8 or UCS-2 Arabic XML text in her personal web space.
She will have a very hard time even finding out who is in charge of
the university's HTTP server (if she knows what an HTTP server is),
and she will probably have graduated before the university's
administration has gotten around to approving letting the web-master
look into reporting the correct encoding for her document.

In the end, it looks like application/xml is a _much_ better choice
than text/xml -- with ∆lfred, I have found that I can do a very good
job autodetecting character encoding, and I imagine that other parser
writers will find the same.

All the best,


David Megginson                 ak117@freenet.carleton.ca
Microstar Software Ltd.         dmeggins@microstar.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS