OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Need clarification of Section 2.11 End-of-Line Handling in XML Specifica

[ Lists Home | Date Index | Thread Index ]
  • To: xml-dev@lists.xml.org
  • Subject: Need clarification of Section 2.11 End-of-Line Handling in XML Specification
  • From: "Roger L. Costello" <costello@mitre.org>
  • Date: Fri, 05 Sep 2003 08:46:54 -0400
  • Organization: The MITRE Corporation

Hi Folks,

Here is section 2.11 End-of-Line Handling of the XML specification:

      http://www.w3.org/TR/REC-xml#sec-line-ends

As I see it, there are two ways to interpret this section:

1. Line breaks in ***external parsed entities*** are normalized to a
new-line character.  Let's take an example:

Suppose that this XML fragment is in a file, para.txt (I explicitly show
the line break characters - \n\r)

<para>This is a \n\r
simple paragraph. What \n\r
do you think of it?</para> \n\r

Here is an XML document which contains an external entity declaration:
(again, I explicitly show the line break characters)

<?xml version="1.0"?> \n\r
<!DOCTYPE Test [ \n\r
<!ENTITY p SYSTEM "para.txt"> \n\r
]> \n\r
<Test> \n\r
     &p; \n\r
</Test> \n\r

My read of this description of this statement in the XML specification
...

"the XML processor normalized all line breaks in ***external parsed
entities*** (including the document entity) on input, before parsing, by
translating both the two-character sequence #xD #xA and any #xD that is
not followed by #xA to a single #xA character."

... is that when an XML processor encounters an ***external entity
reference*** it will resolve it by inlining the referenced text (e.g.,
para.txt) and normalize all line breaks (in para.txt) to a new line
character.  Thus, after resolving the entity reference the XML document
looks like this:

<?xml version="1.0"?> \n\r
<!DOCTYPE Test [ \n\r
<!ENTITY p SYSTEM "para.txt"> \n\r
]> \n\r
<Test> \n\r
     <para>This is a \n
     simple paragraph. What \n
     do you think of it?</para> \n
</Test> \n\r

Notice that the only part which has been affected by the "normalization"
is the text from the entity.

Okay, that's the first way that this section of the XML specification
could be interpreted.

2. The second way of interpreting this section of the XML specification
is that an XML processor will normalize *** all *** line breaks to the
new line character.  Thus, for the above example we would end up with
this:

<?xml version="1.0"?> \n
<!DOCTYPE Test [ \n
<!ENTITY p SYSTEM "para.txt"> \n
]> \n
<Test> \n
     <para>This is a \n
     simple paragraph. What \n
     do you think of it?</para> \n
</Test> \n

Notice that all line breaks are the new line character.

Okay, so which interpretation is correct?  /Roger






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS