[
Lists Home |
Date Index |
Thread Index
]
>> It is unnatural to allow #85 as white space in XML as
>(currently at least)
>it
>> isn't as far as I know an end of line character in any
>ascii/unicode based
>system.
>> So it is completely unlike the situation with #10 and #13.
>
>Ummm, the Unicode consortium has supplied an entire technical report
>http://www.unicode.org/unicode/reports/tr13/ on this
>fascinating subject.
>The first sentence says "Newlines are represented on
>different platforms by
>carriage return (CR), line feed (LF), CRLF, or next line (NEL)." That
>implies to me that #85 is completely IDENTICAL to the
>situation with #10 and
>#13 in Unicode.
Pardon my naive question, but how comes that Unicode, which can handle
different character representations depending on the encoding used, does not
have a SINGLE newline codepoint that would map onto 0x0D0A (CRLF) on some
platform, 0x0D (CR) or 0x0A (LF) on others, 0x85 (NEL) on mainframes, etc. ?
If such a characted existed, the XML spec could just mention it as a
possible whitespace, letting the parser handle the various end-of-line
markers based on the 'encoding' parameter in the <?xml header... A bit like
the <?xml encoding="ascii-with-nel" proposed by David.
The character encoding would therefore give the parser a hint of the
end-of-line encoding of the Unicode newline codepoint used in the document.
But it's a fantasy since the concept of character encoding does not includes
"end-of-line" encoding, does it ?
Regards,
Nicolas Lehuen
Responsable R&D / Head of R&D
UBICCO, the Multi-Access Software Vendor
http://www.ubicco.com/
mailto:nicolas.lehuen@ubicco.com
Phone : +33 155 040 321
Fax : +33 155 040 304
Mobile: +33 661 907 640
|