OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   whitespace normalization (was Re: [xml-dev] Create XML)

[ Lists Home | Date Index | Thread Index ]

At 11:49 AM 6/20/2003 +0700, James Clark wrote:
>It's worse than this.  If your infoset contains a carriage return, you 
>have to output it as a numeric character reference, otherwise line-end 
>normalization will turn it into a line-feed. Similarly, if attribute 
>values in the infoset contain line-feeds or tabs, they need to be output 
>as numeric character references, otherwise attribute value normalization 
>will turn them into spaces.

The more I've looked at whitespace normalization by XML processors, the 
more it seem to be a convenience for one group of users which produces 
strange and largely unavoidable inconveniences for other classes of 
users.  The complexity seems to grow especially rapidly if multiple 
parse/manipulate/re-serialize cycles occur.

(Then there were parsers which called themselves "XML applications", with 
their own expectations for whitespace processing, but I haven't looked into 
MSXML whitespace handling in a while.)

I now have a processor (Ripper) that lets me do my own normalization (or 
not), but this seems generally like a field where more consideration might 
be a good idea.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS