OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] carriage return handling in XML parsers

[ Lists Home | Date Index | Thread Index ]

> I am trying to understand the implications of what the
> XML 1.0 spec says about End-of-Line Handling and would
> appreciate some clarification from more experienced
> shoulders.
>
> It would appear that given this section, it is never possible
> to get unaccompanied carriage return "characters" in the
> stream of information provided by an XML parser, be it
> SAX or DOM, unless I encode these as character references
> in the input file to the parser. Is this correct?

Yes.
>
> On a related note, assuming simple ascii files, if I now
> encode the carriage return as a character reference, and
> round trip the file through an XSLT identity transform,
> will the output file be identical or will the carriage
> return now be represented as a single <CR> byte?
>
A very good question, and I don't think the spec gives a very clear answer.
Probably when serializing a text or attribute node containing a CR
character, the serializer should output "&#x0d;", because that is the only
way of meeting the requirement that the sequence parse(serialize(X)) should
give a tree identical to X. But Saxon today doesn't do that; it outputs a CR
directly, which will turn into NL on re-parsing.

Michael Kay
Software AG
home: Michael.H.Kay@ntlworld.com
work: Michael.Kay@softwareag.com





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS