[
Lists Home |
Date Index |
Thread Index
]
At 11:49 AM 6/20/2003 +0700, James Clark wrote:
>It's worse than this. If your infoset contains a carriage return, you
>have to output it as a numeric character reference, otherwise line-end
>normalization will turn it into a line-feed. Similarly, if attribute
>values in the infoset contain line-feeds or tabs, they need to be output
>as numeric character references, otherwise attribute value normalization
>will turn them into spaces.
The more I've looked at whitespace normalization by XML processors, the
more it seem to be a convenience for one group of users which produces
strange and largely unavoidable inconveniences for other classes of
users. The complexity seems to grow especially rapidly if multiple
parse/manipulate/re-serialize cycles occur.
(Then there were parsers which called themselves "XML applications", with
their own expectations for whitespace processing, but I haven't looked into
MSXML whitespace handling in a while.)
I now have a processor (Ripper) that lets me do my own normalization (or
not), but this seems generally like a field where more consideration might
be a good idea.
|