Lists Home |
Date Index |
John Cowan wrote:
>> Just to put some emphasis to what John Cowan already said, I'm afraid
>>of the cost of normalizing on-the-fly, the algorithms I could found
>>in the Unicode annexes were just scary ...
> ICU is, as always, the gold standard for this kind of thing. It has
> both normalizing and normalization-checking algorithms.
Well yeah, except for the ICU implementation is several times the size
of your average XML processor, and its APIs are way harder to learn than
for example SAX.
I really understand the desire to clean up the normalization picture,
but I think the cost is high and the nondeterministic behavior specified
by 1.1 is a problem. We got a tremendous amount of static over the fact
that the processors may be nondeterministically validating or
non-validating, now that matrix has four slots.
Having said that, looking back, the fact that XML is Guaranteed Not To
Choke on properly i18nized text, makes it easy for engineers to Do The
Right Thing and makes it really hard to ignore i18n, has turned out to
be a huge selling point for XML to an extent that none of us could have
predicted. So this may turn out to be cost-effective.
Why doesn't 1.1 just say "Processors must validate at user option"?
I.e. this is behavior that can be turned off. Then you know what's
going to happen. -Tim