Firstly, I have to admit that the ElCel
validator does not accept UTF8 as an alias for UTF-8. In my earlier post I
stated that it accepts some encoding aliases. In fact it doesn't
currently accept any aliases, only the IANA names. I should look at the
code before making such assertions!
I was interested why I had this false
memory. On looking back over our decisions, I see that we did consider
accepting aliases, mainly because Java InputStreamReader works this way and we
modelled some of our C++ io classes on Java. However we decided that the
XML 1.0 rec recommends being strict so that is what we implemented. Tim
Bray's comments have raised some doubt that this is the best
Our general philosophy when writing the XML
Validator was to be as strict as possible. After all, one task of the XML
validator is to give as much assurance as possible that documents passing
through successfully are guaranteed not to be rejected by another conforming
processor down the line. However,
we do accept ISO-8859-1 and US-ASCII encodings, which other processors are not
guaranteed to accept, so that partially diminishes our validity
----- Original Message -----
Sent: 11 June 2001 22:42
Subject: RE: Unrecognized encodings (was
Re: XML 1.0 Conformance Test Resu lts)
Tim Bray wrote:
>is the word "should". In any case,
I'd write software to accept
>UTF8, but I'd complain at anyone who sent me data so labeled.
Perhaps a bit hard to argue with a
veteran such as Tim Bray, but
from what I know of the history of SGML and
XML, I wonder: when designing XML, was not one
the main issues to make
features than SGML?
made a clear choice for the standard support of
the Unicode/UCS character set.
Shoudn't the (most commonly used?) Unicode
encodings "UTF-8" and
"UTF-16" and their labeling
be treated as one of the cornerstones for XML(parsers)?
Personally I like it when something
heavily (i.e. fatal
error). It contributes
clarity and stability. For XML parser writers
as well as for users who switch between then
this then that brand of XML
For such issues,
flexibility leads to less security IMHO.
Furthermore, I don't quite see the
writing flexible software (by ones own hand, I presume)
while at the same time
b) complaining when a not so accurate encoding
Perhaps is this perceived as
a bit more personal?