A file has no inherent format.
The format of a file is determined by the programs that use it.
Since file types are not determined by the file system, the "kernel" can't tell you the type of file: it doesn't know.
You might wonder why the system doesn't track file types more carefully, so that, for example, the "sort" program is never given a directory as input. One reason is to avoid precluding some useful computations. Although
> sort /bin
doesn't make much sense, there are many commands that can operate on any file at all, and there's no reason to restrict their capabilities. Octal dump (od), word count (wc), copy (cp), compare (cmp), and many others process files regardless of their contents. But the formatless idea goes deeper than that. If, say, the input to LaTeX were distinguished from Java source, a text editor would be forced to make the distinction when it created a file, and probably when it read in a file for editing again.
Instead of creating distinctions, the system tries to erase/lessen them. All text consists of lines terminated by newline characters, and most programs understand this simple format. This uniformity is unusual; most systems have several file formats, even for text, and require negotiation by a program or a user to create a file of a particular type. In the system there is just one kind of file, and all that is required to access a file is its name.
There's a good test of file system uniformity, due originally to Doug Mcllroy. Can the output of a FORTRAN program be used as input to the FORTRAN compiler? A remarkable number of systems have trouble with this test.
-------
The above are excerpts from the book, The Art of UNIX Programming, page 46-47. The "system" being referred to is the UNIX system.
How do those excerpts apply to XML? Why are there so many file formats - the XML file format, the JSON file format, the CSV file format, and so on? Isn't that contrary to the idea of formatless files?
You changed the definition of format at this point. Above you quote latex, java, fortran as all having the same format, text file.
xml, json and csvĀ are also text in that sense. They have different syntax within the text file, just as java has a different syntax to latex.
David
/RogerĀ
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.
[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php