OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Binary versus Text

I would say that a text file is one which, when sequentially read, has is a simple transformation from the bytes to a sequence of characters in one or more character repertoires (lists), fully consuming all bytes with none remaining, except any file-termination codes. This transformation may be direct mapping using the values of the bytes, or may involve mapping sequences of bytes to some other number  (e.g. UTF-8), or may involve a simple state machine (e.g. ISO 2022), for example, (but surely nothing requiring a stack or random access.)   The result and initial objective of parsing the file is a single sequence of characters.

I would say that a binary file, when used in distinction to "text file", is one which uses potentially more complex transformations, where the result and initial objective of parsing the file will be a data structure or event stream. 

So a ZIP file containing an uncompressed XML file is not a text file, because there are some bytes that are not intended to map to characters. But a file with a single DNA sequence as a packed string probably counts as a text file.

{You might say that therefore a file containing artificial languages like markup is a text file that is also like a binary file (in that you end up with a data structure or event stream.)}

Text and binary are also names used to represent different modes in some applications: e.g. in FTP a text file may have its newlines replaced with platform specific newlines (a la text/* MIME type)  and perhaps even be transcoded, while a binary file will be kept byte-for-byte intact (a la application/*  MIME type.) This usage for modes should not cloud the usage relating to files.

So the test of a text file is "can I read it?" but "is it intended to be a sequence of characters from some repertoire with  a 'simple' O(n) sequential mapping from the bytes"? 

Something like that.


On Mon, Nov 25, 2013 at 1:25 AM, Costello, Roger L. <costello@mitre.org> wrote:

Hi Folks,


Distinguishing "text" versus "binary" is important.


On October 30 we had a discussion titled, "Is the binary file format dead?"


During that discussion John Cowan made an excellent distinction between binary and text files. I thought it would be useful to summarize the distinction.


The universe of computer files falls into two categories:


1. Binary files

2. Text files


By convention we normally restrict "binary" to files which are not interpretable as streams of characters. [John Cowan]


The word "text" is applied to files which are interpretable as streams of characters.


Of course any text file is also a binary file, since the class of text files is obtained from the class of binary files by applying restrictions. But it would be confusing to call a text file a binary file; it would be like calling a cat a mammal: correct but imprecise.




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS