OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Why would MS want to make XML break on UNIX, Perl, Python

[ Lists Home | Date Index | Thread Index ]

Rick Jelliffe wrote:

> As I understand it, a file opened in text mode through stdio may have embedded ^D (UNIX) 
> or ^Z (PC) converted to EOF by the standard library routines that read/write from/to
> stdio and present them to the application. This is independent of terminal signals,
> such as sending ^D to a shell.

On UNIX systems, No.  UNIX system's stdio library makes no distinctions
on text mode and binary mode.  The one who handles ^D as EOF indication
is the tty line discipline module in the kernel, not stdio library.  If
the tty is in canonical mode (line-at-a-time reading mode), and if the
user typed EOF character at the beginning of line, the read() system call
returns 0 without error and the stdio library will set EOF flag on the
FILE structure attached to the tty.  After that, stdio functions such
as fgetc() will return EOF until the EOF flag is cleared.  You can change
EOF character by using stty utility, such as "stty eof '^X'".  Please
note that the EOF character is a tty's property, not others'.  If the
tty is in raw mode (character-at-a-time reading mode), any character,
including ^D and NUL, can be read.  See stty(1), termio(7) and termios(7)
for more detail.

On DOS systems, however, stdio library is responsible for ^Z handling.
If a file is opened in text mode, and if ^Z is found in the file,
stdio library functions such as fgetc() will return EOF.  Please note
that ^Z handling at the end of file is a backward compatibility
behavior, and text files are not required to contain ^Z at the end.
Text files without ^Z at the end is perfectly legal on DOS.

Historically, ^Z for the end of file indication was required on CP/M
systems, where file size is managed as multiples of sector size (usually
128 bytes).  DOS systems, where file size is managed as number of bytes,
do not require such EOF byte, but treated ^Z as EOF for CP/M compatibility.
UNIX systems have no such a thing as "EOF byte" in text files.

DOS system call with terminal input also interprets ^Z as EOF character.
There is no equivalence of stty on DOS, so you can't change the EOF
character.  You can read ^Z character if you use a BIOS call for
keyboard input.

Please note that the "text mode" is introduced to the C language to
cooperate with systems with line ending convention different to UNIX.
On DOS, CRLF is converted to LF (\n) in text mode.  You can read text
files in "binary mode", and in such a case, CRLF-to-LF mapping and
^Z-as-EOF handling are disabled.

NUMATA Toshinori
XML Application Technology Development Dept., PROJECT-A XML,
Phone: +81-45-476-4637 (x4673)	Fax: +81-45-476-4734


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS