[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Unrecognized encodings
- From: Richard Tobin <richard@cogsci.ed.ac.uk>
- To: xml-dev@lists.xml.org
- Date: Mon, 11 Jun 2001 23:24:44 +0100 (BST)
I don't see anything in the text quoted from the spec that says that
parsers should not accept "UTF8".
> In an encoding declaration, the values "UTF-8", "UTF-16", [...]
> should be used for the various encodings and transformations of
> Unicode / ISO/IEC 10646 [...]
This clearly applies to thedocuments, not parsers.
> It is recommended that character encodings registered (as charsets)
> with the Internet Assigned Numbers Authority [IANA-CHARSETS], other
> than those just listed, be referred to using their registered names;
Again, this applies to documents.
> other encodings should use names starting with an "x-" prefix. XML
> processors should match character encoding names in a case-insensitive
> way and should either interpret an IANA-registered name as the
> encoding registered at IANA for that name or treat it as unknown [...]
This applies to parsers, but seems to be saying that parsers shouldn't
interpret IANA-registered name as anything but the corresponding
encoding (eg they shouldn't interpret "UTF-16" as EBCDIC). If they
don't interpret it as the right thing, they should treat it as
unknown. It doesn't say anything about names that *aren't*
IANA-registered.
That is, the "it" in the last quoted line refers to "an IANA-registered
name".
> My question is, must the XML parser developer honor these "shoulds" as if
> they were "musts" and produce a fatal error rather than accepting "UTF8"?
So I don't think there's even a "should" applying to how the parser
interprets "UTF8", let alone a "must".
-- Richard