[
Lists Home |
Date Index |
Thread Index
]
Joe English wrote:
>>Any XML parser MUST reject it, otherwise it doesn't deserve the name.
>
>
>
> True, but the aggregator as a whole might still accept it --
> possibly by noticing that the encoding is mislabelled and
> munging the data into proper UTF-8 before passing it to the
> parser.
So how do you do it reliably. If you encounter a single 0xA4 in a data
stream with no encoding declaration, what Unicode character do you map
it to?
> In fact any aggregator that doesn't do something like this
> is doomed to fail -- *nobody's* feed has the encoding labelled
> properly. (Well, maybe not "nobody", but certainly not very many.)
That may be true for RSS (although recent stats taken by Mark show that
only one out of 34 (?) had that problem). We should try hard to avoid
that situation for a new format, otherwise there's no point in replacing
RSS anyway.
--
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760
|