[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Line ending normalization
- From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
- To: xml-dev@lists.xml.org
- Date: Mon, 04 May 2009 15:13:33 -0400
At 2009-05-04 12:14 -0400, Bob Kline wrote:
>I'm having a hard time finding the language in the 1.0 spec [1]
>which would make it clear whether the line ending normalization
>which XML processors must perform (more precisely, "must behave as
>if it normalized all line breaks ...") happens before or after the
>replacement of character entities.
A line end sequence is comprised only of naked characters, not
composed parsed numeric character references.
>In other words, for the following document:
>
><a>x
y</a>
>
>is the value returned by the XML parser for the text content of
>element e "x\r\ny" or "x\ny"?
"x\r\ny" because that is what is in the element ... there are no line
end sequences in the element.
>Could someone point to the language which would address this timing
>question?
Here:
http://www.w3.org/TR/2008/REC-xml-20081126/#sec-line-ends
XML parsed entities are often stored in computer files which,
for editing convenience, are organized into lines. These lines
are typically separated by some combination of the characters
CARRIAGE RETURN (#xD) and LINE FEED (#xA).
To simplify the tasks of applications, the XML processor MUST
behave as if it normalized all line breaks in external parsed
entities (including the document entity) on input, before
parsing, by translating both the two-character sequence #xD #xA
and any #xD that is not followed by #xA to a single #xA character.
Note that the "#xA" and "#xD" bits of text are *not* parsed numeric
character references, they are only prose character references. It
is an unambiguous way of referring to the characters, but it is the
naked characters that are being referred to.
Note the bit "before parsing" ... so the naked characters get
replaced by a naked #xA and *then* the parsed numeric character
references of your example would be parsed as content.
>And do the major XML parser implementations handle this issue consistently?
I haven't tripped over a problem with this with various
implementations ... have you recognized inconsistent
behaviour? Certainly the specification seems unambiguous.
I hope this helps.
. . . . . . . . . . Ken
--
XQuery/XSLT/XSL-FO hands-on training - Los Angeles, USA 2009-06-08
Crane Softwrights Ltd. http://www.CraneSoftwrights.com/x/
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video lesson: http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18
Video overview: http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18
G. Ken Holman mailto:gkholman@CraneSoftwrights.com
Male Cancer Awareness Nov'07 http://www.CraneSoftwrights.com/x/bc
Legal business disclaimers: http://www.CraneSoftwrights.com/legal
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]