OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] Unicode normalization in XML 1.1

[ Lists Home | Date Index | Thread Index ]

> The point is that normalization is expensive, and it may be 
> too expensive to do at all in small systems.  Therefore, the 
> W3C's choice (expressed in the Character Model) is to have 
> senders normalize, and receivers check for normalization.  In 
> this way documents are normalized once at creation (or 
> publication) time, rather than every time a document is 
> received; this conserves net-wide cycles, since checking is 
> cheaper than normalizing.

While this policy makes sense, its translation into rules for software
components is unfortunately full of absurdities. The fact that the
character model [1] bans text processing software from doing
normalization [2] means that senders are going to have a tough job
meeting the requirement to normalize the text, because they won't be
able to find any text processing software that does the job for them.

[1] http://www.w3.org/TR/charmod/

[2] Section 4.4: "A text processing component .... must not normalize
suspect text".

Michael Kay


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS