[
Lists Home |
Date Index |
Thread Index
]
At 7:09 AM -0400 10/16/02, John Cowan wrote:
>> Unicode character normalization should be performed on XML documents,
>> unless you don't feel like it, in which case you can ignore it. This almost
>> makes sense. Basically it says that parsers may change an e followed by a
>> combining accent acute into the single character é if they want to or the
>> client asks for it. The details are quite complicated, but at least it's
>> optional.
>
>No, not at all! XML 1.1 says that parsers should *check* normalization,
>not that they should *perform* it. So a parser that sees an e followed
>by a combining acute should report the lack of normalization to the
>calling application.
>
No, I still think there's an issue here, though maybe I don't have my
finger on it yet. Even if the document isn't transformed into
normalized form, the processor might still validate against the
normalized form. Maybe the correct behavior just needs to be spelled
out better.
This is another one of those annoying errors that isn't exactly a
well-formedness error but it isn't exactly a validity error or a
warning either. At least as written, it's in the grey area of XML
error reporting, and that's caused problems before. The exact
behavior of a parser encountering non-normalized text should be
locked down, probably as a warning, not an error of any kind. That
is, parsers should be required to continue processing correctly after
encountering non-normalized text.
Of course this is really the wrong solution to the problem. The right
solution is to kill XML 1.1 completely.
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| XML in a Nutshell, 2nd Edition (O'Reilly, 2002) |
| http://www.cafeconleche.org/books/xian2/ |
| http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://www.cafeaulait.org/ |
| Read Cafe con Leche for XML News: http://www.cafeconleche.org/ |
+----------------------------------+---------------------------------+
|