[
Lists Home |
Date Index |
Thread Index
]
- From: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>
- To: xml-dev@ic.ac.uk
- Date: Wed, 07 Apr 1999 14:39:30 +0900
Chris,
> It's good to see a concrete proposal. On the other hand, relying on a
> complex convention of filename suffixes is problematic:
I understand your concern. However, Uchida-san's proposal is not an attempt
to use convention instead of the charset parameter. It is intended
to help to provide the correct charset parameter. I agree that there are some
side-effects which some people might oppose to.
> An alternative method for achieving the same result is to use a filter
> (this can be done in Apache and in Jigsaw) which automatically emits the
> correct charset parameter based on reading the encoding declaration in
> the XML instance. This can easily cache its results, and need not
> result in processing overhead on each request.
I strongly agree. This is the best approach. I sincerely hope that such
an attempt will happen at W3C.
> > At *IETF*, the default of the charset parameter for text/HTML *is* 8859-1.
>
> Yes, which is different to the default for text/* - this demonstrates
> that it is possible to give a more specific rule for a particular
> registration.
Actually, in the case of HTTP MIME, the default of the charset parameter of
text/* is always ISO-8859-1. In the case of real MIME, the default of
the charset parameter of text/* is always US-ASCII. text/html is not an exception.
text/xml is an exception, since the default is always US-ASCII. This was
recommended by ISEG.
> > It is going to be very difficult or
> > impossible, since HTTP and MIME people will disagree.
>
> I think you mean, HTTP and Mail(SMTP/IMAP/POP). MIME is used by both
> email and HTTP.
HTTP MIME is not quite the same as real MIME. There are many differences
between the two.
> > There have been a lot of discussion about this issue. None of your arguments
> > are new to me. In fact, my original opinion was not so different from yours but
> > I have changed my mind during the discussion. More about this, see the archive
> > of the XML SIG (around April and May of 1998).
>
> OK, I will check this out. I cannot of course discuss such material in
> this forum, however. Perhaps you could post your technical reasons for
> the change of direction here?
text/xml has to be consistent with HTTP and MIME. Autodetection
or the use of META tags as the default of the charset parameter has been
extensively discussed by HTTP people and MIME people. They strongly dissent.
> But, if it is not present,
> then the XML Rec says exactly what should happen;
Appendix F is non-normative. RFC2376 supercedes it, as intended by the
XML WG. XML 1.0 cleary says:
"Rules for the relative priority of the internal label and the MIME-type
label in an external header, for example, should be part of the RFC document
defining the text/xml and application/xml MIME types. ... in particular,
when the MIME types text/xml and application/xml are defined, the recommendations
of the relevant RFC will supersede these rules."
By the way, now that RFC 2376 is publisehd, XML 1.0 will be revised.
>carefull wording which
> this RFC nullifies. Problems arise if an XML file is saved from the Web
> to a local filesystem, perhaps for further editing; the MIME charset
> information is lost. It could perhaps be stored in some way - but, there
> is already a standard way - the XML encoding declaration.
Since it is a standard way, RFC 2376 recommends recipient programs to
rewrite encoding declarations.
> And if the charset parameter is present, then it should say the same
> thing as the encoding declaration.
This disallows code conversion by proxy servers. One could argue
that proxy servers should rewrite encoding declarations. However,
documents should not be rewritten for security reasons. Moreover,
if we require different code conversion for different subtypes of text,
there is not much hope for interoperability, especially because
fallback to text/plain is required.
> The best way to ensure this is to
> treat the XML encoding declaration as the prmary metadata resource and
> to programatically derive the charset parameter from this; greater
If it is done when the document is stored in the WWW server, that is
superb.
> However, I will point out that it is the consensus of the XML 1.0
> Recommendation that I am respecting - and that the RFC does not, by
> altering the meaning of the default encoding. It could have been
> harmionised with the XML REC; it was not.
RFC 2376 IS the consensus (it was not unanimous, though). It is based
on really extensive discussion at the XML SIG and XML WG. My mail
folder named text/xml has 687 e-mails ;-( Larry Masinter (the HTTP WG
chair) and Martin Duerst (the I18N IG chair) was heavily involved. On
the other hand, appendix in XML 1.0 is merely informative and was meant
to be replaced by the XML media type RFC.
Cheers,
Makoto
Fuji Xerox Information Systems
Tel: +81-44-812-7230 Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|