[
Lists Home |
Date Index |
Thread Index
]
> The point is that normalization is expensive, and it may be
> too expensive to do at all in small systems. Therefore, the
> W3C's choice (expressed in the Character Model) is to have
> senders normalize, and receivers check for normalization. In
> this way documents are normalized once at creation (or
> publication) time, rather than every time a document is
> received; this conserves net-wide cycles, since checking is
> cheaper than normalizing.
While this policy makes sense, its translation into rules for software
components is unfortunately full of absurdities. The fact that the
character model [1] bans text processing software from doing
normalization [2] means that senders are going to have a tough job
meeting the requirement to normalize the text, because they won't be
able to find any text processing software that does the job for them.
[1] http://www.w3.org/TR/charmod/
[2] Section 4.4: "A text processing component .... must not normalize
suspect text".
Michael Kay
|