Lists Home |
Date Index |
- From: MURATA Makoto <firstname.lastname@example.org>
- To: Chris Lilley <email@example.com>
- Date: Wed, 24 Mar 1999 11:17:03 +0900
Chris Lilley wrote:
> > Unfortunately this is a side effect of the rules for the media type
> > "text/*", which says that the default value of "charset" is always US-ASCII.
> The default rules if no other rule is in place for a specific Media
> type. The registration for text/xml can overridfe this behaviour if it
> wishes to.
HTTP/1.1 (RFC2068 and the latest "Draft Standard") quite clearly says:
The "charset" parameter is used with some media types to define the
character set (section 3.4) of the data. When no explicit charset
parameter is provided by the sender, media subtypes of the "text"
type are defined to have a default charset value of "ISO-8859-1" when
received via HTTP. Data in character sets other than "ISO-8859-1" or
its subsets MUST be labeled with an appropriate charset value.
Here, the default is 8859-1 ;-(
The latest I-D for RFC2376 also said that the default is 8859-1 when the XML
document is being tramsmitted by HTTP. However, the IESG requested US-ASCII
as the default.
> > IESG discussed the document today that defines the text/xml media type.
> > We note that it contines the practice of text/plain where the default
> > charset is iso-8859-1 if transported over HTTP, but us-ascii if
> > transported over SMTP.
> > This inconsistency was a result of a wide deployment of HTTP
> > implementations that did not properly following the MIME spec.
> > Having one media type which is used inconsistently between HTTP
> > and SMTP is bad enough, but we don't want to continue this practice
> > for new media types. Inconsistencies between HTTP and SMTP
> > usage make it more difficult to gateway between HTTP and email,
> > or to use HTTP to access email contents.
> > We suggest to have the charset parameter default to US-ASCII regardless
> > of transport, and strongly recommend that the parameter always be
> > supplied by senders. (If the sender is unsure whether the charset
> > is US-ASCII or ISO-8859-1, it can safely label it as ISO-8859-1,
> > since the former is a subset of the latter).
Chris Lilley wrote:
> So, in consequence: example file such as the Chinese XML examples at
> http://xml.ascc.net/xml/test/index.html (where each example is available
> UTF-8, Big5 and GB2312, all correctly labelled in the XML encoding
> declaration) are now sets of invalid XML files which are required to
> produce a critical error because of the invalid byte sequences in what
> is now described as a US-ASCII file?
Yes. Conformant XML parsers must report a fatal error. This is great since
non-conformant data can always be detected.
Examples of conformant XML documents are available at:
Fuji Xerox Information Systems
Tel: +81-44-812-7230 Fax: +81-44-812-7231
xml-dev: A list for W3C XML Developers. To post, mailto:firstname.lastname@example.org
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:email@example.com the following message;
To subscribe to the digests, mailto:firstname.lastname@example.org the following message;
List coordinator, Henry Rzepa (mailto:email@example.com)