RE: [xml-dev] Binary versus Text

>Yet Notepad could read these "Text Files" just fine.

Of course it can, because there is no such problem.

-----------

The problem is not with UTF8.

The problem is with the Windows "text mode" option to the posix emulated File open methods, such as fopen(), _open() etc.

This mode tries to help people write "posix compatible" code by filtering out window-isms and turning them into unix-isms.

One nice thing this does is convert CF/LF to LF .. .(oh so HARD ...)(

One *horrid* thing it does is assume you're running like a DOS 1.0 filesystem where Control-Z was actually used as EOF

so when read() if used with this "text mode" in windows encounters a control-Z it returns -1 ... EOF.

When the filesystem upgraded some decades ago ( I dont know when exactly .... ) but by then there was a convention to actually

write Control-Z literally to text files to indicate the end of file. Then there was a transient period where the filesystem itself didnt care about the control-z

but many text oriented programs started looking for control-Z and assuming it was EOF.

This made it into the CRT library for windows for posix compatibility and exists today ...

And so yes, if you try to read a UTF8 encoded "Text File" using windows "Text Mode" in the posix emulated system calls

you will not be able to read a Control-Z or any characters after it.

-David

----------------------------------------

David A. Lee

dlee@calldei.com

http://www.xmlsh.org