Lists Home |
Date Index |
> No, not at all! XML 1.1 says that parsers should *check* normalization,
> not that they should *perform* it. So a parser that sees an e followed
> by a combining acute should report the lack of normalization to the
> calling application.
> This is a most important distinction. XML *generators* should generate
> normalized output; XML *accepters* should check normalization.
I don't understand the need for normalization checking.
The spec says this:
The purpose of this section is to strongly encourage XML processors to ensure that the creators of
XML documents have properly normalized them, so that XML applications can make tests such as
identity comparisons of strings without having to worry about the different possible "spellings" of
strings which Unicode allows.
If Unicode allows strings to have different spellings, than this is a generic
problem for all applications processing Unicode strings. So why add the extra
complexity to an XML processor to check for normalization, so that an application
that would normally treat Unicode strings in a standard way suddenly can do it
differently, because the XML processor already takes care of part of it?
I am sure there will be (or are) generic libraries for that kind of
Unicode processing. To me this looks as if there is no proper
"separation of concerns", i.e. an XML processor should not concern
itself with the issue of normalization.
It may, however, make sense for generating canonical XML.