XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Formatless files

A file has no inherent format.

The format of a file is determined by the programs that use it. 

Since file types are not determined by the file system, the "kernel" can't tell you the type of file: it doesn't know. 

You might wonder why the system doesn't track file types more carefully, so that, for example, the "sort" program is never given a directory as input. One reason is to avoid precluding some useful computations. Although

> sort /bin

doesn't make much sense, there are many commands that can operate on any file at all, and there's no reason to restrict their capabilities. Octal dump (od), word count (wc), copy (cp), compare (cmp), and many others process files regardless of their contents. But the formatless idea goes deeper than that. If, say, the input to LaTeX were distinguished from Java source, a text editor would be forced to make the distinction when it created a file, and probably when it read in a file for editing again. 

Instead of creating distinctions, the system tries to erase/lessen them. All text consists of lines terminated by newline characters, and most programs understand this simple format. This uniformity is unusual; most systems have several file formats, even for text, and require negotiation by a program or a user to create a file of a particular type. In the system there is just one kind of file, and all that is required to access a file is its name.

There's a good test of file system uniformity, due originally to Doug Mcllroy. Can the output of a FORTRAN program be used as input to the FORTRAN compiler? A remarkable number of systems have trouble with this test.

-------

The above are excerpts from the book, The Art of UNIX Programming, page 46-47. The "system" being referred to is the UNIX system.

How do those excerpts apply to XML? Why are there so many file formats - the XML file format, the JSON file format, the CSV file format, and so on? Isn't that contrary to the idea of formatless files?

/Roger  


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS