Lists Home |
Date Index |
> I am trying to understand the implications of what the
> XML 1.0 spec says about End-of-Line Handling and would
> appreciate some clarification from more experienced
> It would appear that given this section, it is never possible
> to get unaccompanied carriage return "characters" in the
> stream of information provided by an XML parser, be it
> SAX or DOM, unless I encode these as character references
> in the input file to the parser. Is this correct?
> On a related note, assuming simple ascii files, if I now
> encode the carriage return as a character reference, and
> round trip the file through an XSLT identity transform,
> will the output file be identical or will the carriage
> return now be represented as a single <CR> byte?
A very good question, and I don't think the spec gives a very clear answer.
Probably when serializing a text or attribute node containing a CR
character, the serializer should output "
", because that is the only
way of meeting the requirement that the sequence parse(serialize(X)) should
give a tree identical to X. But Saxon today doesn't do that; it outputs a CR
directly, which will turn into NL on re-parsing.