OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xml-dev] Text/xml with omitted charset parameter

> > So is it the case, then, that the default for everything in the text/*
> > tree must be ASCII or 8859-1? It's not possible for the subtype text/xml
> > to provide a different default than the type text?
> The whole point of registering a format under text/* is that a dumb
> interpreter can treat it as plain text, expecting CR/LF line terminators
> too infrequently, if it doesn't know what to do with it.  Furthermore,
> unless the charset parameter says otherwise it can treat it as ASCII.
> the rock-bottom Internet interoperability default.
> Dumb interpreters rarely, if ever, should treat XML as plain text
> it as-is, for example).

My understanding so far...

-  text/xml is the most ubiquitous HTTP content type for xml.
-  if absent, the official default charset for text/xml is us-ascii
-  therefore the charset parameter MUST be specified if the content is UTF-8
-  many HTTP servers don't conform to this
-  most xml processors (ours included) don't conform to this

My question is, what is the rationale for this 'standard'?  Does it actually
make any sense for an xml processor to conform?

My thinking goes like this.  If the xml entity contains only us-ascii then
it makes no difference if the xml processor treats it like UTF-8.   On the
other hand, if the xml entity contains UTF-8 characters, then a "non
conformant" processor will read it correctly, whereas a "conformant" one
must reject it.  It appears in this case that non-conformance has only
positive side-effects.  Perhaps I'm missing something?


Rob Lugt
ElCel Technology