[
Lists Home |
Date Index |
Thread Index
]
- To: xml-dev@lists.xml.org
- Subject: How does XML handle differing line break characters on different platforms? Read this to find out.
- From: "Roger L. Costello" <costello@mitre.org>
- Date: Mon, 08 Sep 2003 11:09:12 -0400
- Organization: The MITRE Corporation
[This is an expanded version of the summary that I sent out a few
minutes ago. I thought that some people might find this a bit less
terse and more understandable (plus I elaborated on some things).]
Hi Folks,
Different systems use different line break characters. How do you
handle this in XML? Read on and find out.
Consider this XML document: (line break characters are explicitly shown)
<?xml version="1.0"?> \n\r
<Test> \n\r
<para xml:space="preserve">This is a \n\r
simple paragraph. What \n\r
do you think of it?</para> \n\r
</Test> \n\r
When an XML parser reads in this document it "normalizes" ALL line
breaks. Thus, after normalization the XML document looks like this:
<?xml version="1.0"?> \n
<Test> \n
<para xml:space="preserve">This is a \n
simple paragraph. What \n
do you think of it?</para> \n
</Test> \n
Things to note:
1. All line breaks have been normalized to \n.
Consequence: you don't have to be concerned about different platforms
using different line break characters since all XML documents will have
their line break characters normalized to \n regardless of the
platform. (So, if you're writing an XML Schema regex expression you can
simply use \n to indicate line break, regardless of the platform.)
2. The xml:space="preserve" attribute has no impact on line break
normalization.
3. Suppose that you want a line break character in your XML document,
other than \n. For example, suppose that you want \r in your XML
document. By default, it would get normalized to \n. To prevent this,
use a character entity reference: 
/Roger
|