OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xml-dev] Text/xml with omitted charset parameter



Elliotte Rusty Harold wrote:

>At 5:27 PM +0900 10/28/01, MURATA Makoto wrote:
>
>>No, there are absolutely no chances for such a change.  Such changes 
>>have been tried and failed.  MIME people will never agree to change 
>>the default.
>>
>>So many e-mail programs use the charset parameter to display MIME entities 
>>labelled as text/*.  If the charset parameter is absent, such programs will 
>>assume that the MIME entity is us-ascii.  This change will invaliate such 
>>programs.
>
>So is it the case, then, that the default for everything in the text/*
>tree must be ASCII or 8859-1? It's not possible for the subtype
>text/xml to provide a different default than the type text?

Yes, if you use text/* for MIME, the default MUST be US-ASCII.  
If you use text/* for HTTP, the default MUST be ISO-8859-1.  It 
is not possible for a subtype of text to provide a different 
default.  Absolutely completely no chances at IETF.

Put another way, this issue is not specific to XML, but applies to
other textual formats including plain text.  A real *solution* should
work for all such formats.  The charset paramter is the only mechanism
which should work for all such formats and it allows transcoding.
Although the charset parameter has not worked very well yet, are they
any other solutions for plain text?

Rob Lugt wrote:

>My question is, what is the rationale for this 'standard'?  Does it actually
>make any sense for an xml processor to conform?

The rationale is not to destroy existing MIME applications which have
successfully handled all subtypes of the top-level media type "text".

>My thinking goes like this.  If the xml entity contains only us-ascii
>then it makes no difference if the xml processor treats it like UTF-8.
>On the other hand, if the xml entity contains UTF-8 characters, then a
>"non conformant" processor will read it correctly, whereas a
>"conformant" one must reject it.  It appears in this case that
>non-conformance has only positive side-effects.  Perhaps I'm missing
>something?

If you consider XML processors only, such "non-conformant" processors
will usually meet users' expectations.  However, existing MIME applications,
which have corretly worked for text/*, will fail.  When transcoding
proxy servers (e.g., DeleGate) convert text/* MIME entities without rewriting 
encoding declarations, "non-conformant" processors will also fail.

If you do not believe the charset parameter, please use application/xml 
without providing the charset parameter.  

Ian Graham wrote:

> The core questions, I think, are:
>   * how 'deeply' MIME types (or some future variant of the MIME typing
>     mechanism) should poke into the data they are 'typing'? 
>   * And, if they poke deeply, what type of/how much information should be
>     revealed?
> 
> An alternative would be to ask the question: "How can one structure XML so
> that a 'shallow' look into an XML 'part' can determine which 'types' are
> inside it?" Namespace declarations would do this to some degree, but at
> the expense of forcing a dispatcher to parse the XML.

I believe that MIME is rather archaic, and we need a new solution for labelling, 
dispatching, and negotiation, especially when you consider multi-namespace
XML documents.

Cheers,

Makoto