Re: [xml-dev] Binary versus Text

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Norman Gray <norman@astro.gla.ac.uk>
To: Rick Jelliffe <rjelliffe@allette.com.au>
Date: Tue, 26 Nov 2013 17:58:06 +0000

Greetings.

On 2013 Nov 26, at 14:37, Rick Jelliffe wrote:

> I would say that a text file is one which, when sequentially read, has is a simple transformation from the bytes to a sequence of characters in one or more character repertoires (lists), fully consuming all bytes with none remaining, except any file-termination codes. This transformation may be direct mapping using the values of the bytes, or may involve [...]

> So a ZIP file containing an uncompressed XML file is not a text file, because there are some bytes that are not intended to map to characters. But a file with a single DNA sequence as a packed string probably counts as a text file. 

Well, sort of.  Isn't this just recapitulating John's (?) remark about intent, which I would in turn rephrase as: being a 'text file', or an 'XML file', or not, is not a property of a file, but a description of what is likely to happen to a file.

If I say 'X is a binary file', then what is presumably in my head is an intention (in unix terms) to open X with open(2) and read it with read(2); if I say 'Y is a text file', then I'm presumably planning to read its contents with fgets(3) or one of the other line-oriented functions (and expect that to work); if I say 'Z is an XML file', then an interaction with an XML parser is in my immediate future (and I expect that to work).

That is, 'a text file' is nothing more nor less than a statement that it is a file with which I could rationally use a 'text API' (ie, that of fopen/fgets), and that there is no other meaningful distinction.

Of course, there's nothing stopping me opening Y and reading it in blocks of bytes, or opening Z and reading it line-by-line, which rather seems to suggest that being 'a text file' or 'an XML file' isn't a property (or at least not an exclusive property) of a file.

One could observe that opening 'a ZIP file' with an XML parser will produce errors pretty promptly.  That does seem to suggest that being 'an XML file' is something that a file can  _not_ be (in the sense that 'X is a ZIP file' implies 'X is not an XML file'), but down that road lies the conclusion that 'a malformed XML file' is not 'an XML file', which doesn't seem helpful.

I've forgotten: why does the distinction matter?

All the best,

Norman

-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK

References:
- Binary versus Text
  - From: "Costello, Roger L." <costello@mitre.org>
- Re: [xml-dev] Binary versus Text
  - From: Rick Jelliffe <rjelliffe@allette.com.au>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]