Firstly, I have to admit that the ElCel
validator does not accept UTF8 as an alias for UTF-8. In my earlier post I
stated that it accepts some encoding aliases. In fact it doesn't
currently accept any aliases, only the IANA names. I should look at the
code before making such assertions!
I was interested why I had this false
memory. On looking back over our decisions, I see that we did consider
accepting aliases, mainly because Java InputStreamReader works this way and we
modelled some of our C++ io classes on Java. However we decided that the
XML 1.0 rec recommends being strict so that is what we implemented. Tim
Bray's comments have raised some doubt that this is the best
approach.
Our general philosophy when writing the XML
Validator was to be as strict as possible. After all, one task of the XML
validator is to give as much assurance as possible that documents passing
through successfully are guaranteed not to be rejected by another conforming
processor down the line. However,
we do accept ISO-8859-1 and US-ASCII encodings, which other processors are not
guaranteed to accept, so that partially diminishes our validity
guarantee.
Regards
Rob Lugt
----- Original Message -----
Sent: 11 June 2001 22:42
Subject: RE: Unrecognized encodings (was
Re: XML 1.0 Conformance Test Resu lts)
Tim Bray wrote: >is the word "should". In any case,
I'd write software to accept >UTF8, but I'd complain at anyone who sent me data so labeled.
-Tim
Perhaps a bit hard to argue with a
veteran such as Tim Bray, but from what I know of the history of SGML and XML, I wonder: when designing XML, was not one
of the main issues to make
something with less optional
features than SGML? XML has
made a clear choice for the standard support of the Unicode/UCS character set. Shoudn't the (most commonly used?) Unicode
encodings "UTF-8" and
"UTF-16" and their labeling be treated as one of the cornerstones for XML(parsers)?
Personally I like it when something
complains heavily (i.e. fatal
error). It contributes to
clarity and stability. For XML parser writers as well as for users who switch between then
this then that brand of XML
parser. For such issues,
flexibility leads to less security IMHO.
Furthermore, I don't quite see the
difference between: a)
writing flexible software (by ones own hand, I presume) while at the same time b) complaining when a not so accurate encoding
labeling is received.
Perhaps is this perceived as
a bit more personal?
Regards, Eric Vermetten
|