OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Possible changes for XML 2nd Edition

[ Lists Home | Date Index | Thread Index ]
  • From: John Cowan <cowan@locke.ccil.org>
  • To: eldarm@microsoft.com (Eldar Musayev)
  • Date: Wed, 24 May 100 21:21:56 -0400 (EDT)

Eldar Musayev scripsit:

> People outside may want just to slip few lines in a text without bothering
> themselves with
> encoding header. Would you like to add charset information to every XML
> document you create?

You *must* do so, unless the document is in UTF-8 or UTF-16.  US-ASCII,
which is a subset of UTF-8, will also work, but ISO 8859-1, or KOI-8R,
or EUC-JP, is illegal without an encoding declaration or the equivalent
charset declaration on a MIME header.

> Because what you are proposing stripes the whole world except few
> purely-English language countries
> of the convenience of a default charset.

The only default charsets are UTF-8 and UTF-16.

> In short, non-valid characters are errors, but they should not be fatal.

We are not talking about invalid characters (such as U+0001) which are
already fatal errors.  We are talking about invalid encodings.
An FF byte in a UTF-8 document means the document is nonsense; there
is no telling what it means.

John Cowan                                   cowan@ccil.org
	Yes, I know the message date is bogus.  I can't help it.
		--me, on far too many occasions

This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS