XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Binary versus Text

Pretty much. That some sequence of bytes can be recognized by the parse rules for some encoding-and-character-set is necessary but not sufficient for the file to be 'text'. It could be an accident. We also have to know that it is supposed to contain characters as the initial layer.

(For the BOM, I think they are characters that have been assigned a supplementary role. So not really an exception.)

For record-based storage a la vms etc, an api may present a file as a virtual text file, but that does mean the file itself should considered be text rather than binary: same as zip.

That edge cases of small files may not have enough information to make the call, does not mean their character is not clear  when typical files are considered.

Another example: an rtf file with only  hex  encoded images is a text file, because every byte maps to an intended character. But an rtf file with an embedded binary image should be considered a binary file, because those bytes are not first intended as characters: as the rtf 1.8 spec mentions, an rtf parser needs to understand \bin and lump the binary data together.

Cheers
Rick

On 27/11/2013 2:19 AM, "John Cowan" <johnwcowan@gmail.com> wrote:



On Tue, Nov 26, 2013 at 9:37 AM, Rick Jelliffe <rjelliffe@allette.com.au> wrote:

I would say that a text file is one which, when sequentially read, has is a simple transformation from the bytes to a sequence of characters in one or more character repertoires (lists), fully consuming all bytes with none remaining, except any file-termination codes. This transformation may be direct mapping using the values of the bytes, or may involve mapping sequences of bytes to some other number  (e.g. UTF-8), or may involve a simple state machine (e.g. ISO 2022), for example, (but surely nothing requiring a stack or random access.)   The result and initial objective of parsing the file is a single sequence of characters.

You probably need to say something about BOMs.  But it's the last sentence that's critical: something is only text if we intend to consume it as text.
 
I would say that a binary file, when used in distinction to "text file", is one which uses potentially more complex transformations, where the result and initial objective of parsing the file will be a data structure or event stream.

That is, a data structure other than a string, and an event stream other than a stream of character events.
 
Something like that.

    The members of the English Church had ingenuously imagined up
    to that moment that it was possible to contain, in a frame of
    words, the subtle essence of their complicated doctrinal system,
    involving the mysteries of the Eternal and the Infinite on the
    one hand, and the elaborate adjustments of temporal government
    on the other. They did not understand that verbal definitions
    in such a case will only perform their functions so long as
    there is no dispute about the matters which they are intended
    to define: that is to say, so long as there is no need for
    them. For generations this had been the case with the Thirty-nine
    Articles. Their drift was clear enough; and nobody bothered over
    their exact meaning. But directly someone found it important to
    give them a new and untraditional interpretation, it appeared
    that they were a mass of ambiguity, and might be twisted into
    meaning very nearly anything that anybody liked.

        --Lytton Strachey, "Cardinal Manning"
--
GMail doesn't have rotating .sigs, but you can see mine at http://www.ccil.org/~cowan/signatures


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS