OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Unrecognized encodings



I don't see anything in the text quoted from the spec that says that
parsers should not accept "UTF8".

> In an encoding declaration, the values "UTF-8", "UTF-16", [...]
> should be used for the various encodings and transformations of
> Unicode / ISO/IEC 10646 [...]

This clearly applies to thedocuments, not parsers.

> It is recommended that character encodings registered (as charsets) 
> with the Internet Assigned Numbers Authority [IANA-CHARSETS], other 
> than those just listed, be referred to using their registered names; 

Again, this applies to documents.

> other encodings  should use names starting with an "x-" prefix. XML 
> processors should match character encoding names in a case-insensitive 
> way and should either interpret an IANA-registered name as the 
> encoding registered at IANA for that name or treat it as unknown [...]

This applies to parsers, but seems to be saying that parsers shouldn't
interpret IANA-registered name as anything but the corresponding
encoding (eg they shouldn't interpret "UTF-16" as EBCDIC).  If they
don't interpret it as the right thing, they should treat it as
unknown.  It doesn't say anything about names that *aren't*
IANA-registered.

That is, the "it" in the last quoted line refers to "an IANA-registered
name".

> My question is, must the XML parser developer honor these "shoulds" as if
> they were "musts" and produce a fatal error rather than accepting "UTF8"?

So I don't think there's even a "should" applying to how the parser
interprets "UTF8", let alone a "must".

-- Richard