XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] What to escape when serializing XML

On Jan 2, 2007, at 17:11, Pete Cordell wrote:

> In terms of end-of-line encoding, the approach seems to be to  
> output what is convenient (CR, LF, or CRLF) and have the receiving  
> application sort out the situation.

More to the point, the LF character in element content can be  
serialized as CR, LF or CRLF. Of course, LF is the most natural  
serialization.

In order to avoid dataloss, LF, CR and tab need to be escaped in  
attribute values. Otherwise they are normalized to space by the  
parser. This matters for example when round-tripping multiline values  
in XHTML <input type='hidden'/>.

> Conceptually, the receiving XML processor should normalize the end- 
> of-line markers to 0x0A and then the application converts that to  
> which ever of CR, LF, or CRLF is appropriate.

For this reason, in order to avoid dataloss, CR needs to be escaped  
as an NCR to make it survive serialization and parsing round-tripping.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS