XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Formatless files

Roger L Costello <costello@mitre.org> writes:

> A file has no inherent format.

> ...

> -------

> The above are excerpts from the book, The Art of UNIX Programming,
> page 46-47. The "system" being referred to is the UNIX system.

> How do those excerpts apply to XML?

Pervasively, I would have said.  Almost every point raised by your
uncredited author (Eric Raymond, if the Web is telling me the truth) has
an analog in the XML ecosystem, though in most XML usage the data (and
those who create and manage it) tend to be somewhat more central than
the programmers.

>     The format of a file is determined by the programs that use it.

The specific set of tags used to mark up an XML document is determined,
in the usual case, by those who create and use the data, either choosing
from a published vocabulary or rolling their own.  In some cases,
(e.g. the vocabularies used in the XML data in Microsoft Office and its
open-source competition), those who create some software choose the
vocabulary and hard-code it into their programs; in other cases, the
persons or institutions managing the data make the choice.

>     Since file types are not determined by the file system, the
>     "kernel" can't tell you the type of file: it doesn't know.

Since XML document types can be declared, and in any case are manifest
in the document, any program or person who can read a little XML can
tell you what vocabulary is in use in a given XML document.  Most XML
software doesn't care because it can and will handle any XML.

> ...

>     Instead of creating distinctions, the system tries to erase/lessen
>     them. All text consists of lines terminated by newline characters,
>     and most programs understand this simple format. 

Instead of using binary formats which require a close matchup between
files and the programs that read and write them, XML (like all other
text-based formats) uses structures that can be represented as sequences
of characters, and all XML processors understand the relatively simple
syntax of XML.  (The major difficulties for a programmer in parsing XML
come from the fact that in parsing XML you have to bit the bullet and
finally learn to deal with Unicode and ISO 10646.)

Most text-based formats leave some room for variation: some parts of a
JSON data stream are for user-specified names, and the variable and
functions names in a programming language are usually chosen by the
author of the program ('main' in C is an exception to this rule).  XML
seems to leave a bit more freedom to the user: XML allows more variation
in the internal structure of an XML document than JSON or Markdown (for
example) allow in their files.

>     There's a good test of file system uniformity, due originally to Doug
>     Mcllroy. Can the output of a FORTRAN program be used as input to the
>     FORTRAN compiler? A remarkable number of systems have trouble with
>     this test.

McIlroy's test is memorable and mostly persuasive; I have sometimes seen
it in the more general form "it should be possible to feed the output of
any program to any other program as input".  It would not surprise me if
it or something like it was bumping around in the back of people's minds
when they specified that the output of an XSLT transform would, by
default, be an XML document (or before that, that the output of a DSSSL
SGML-to-SGML tree transformation would be an SGML document), and that
the result of evaluating an XQuery expression would be a sequence of XDM
items on which further operations might be performed, and that the
result of most XProc processing steps would be one or more XML documents
which can be fed to other XProc steps.  (It is of course easy enough to
serialize results in non-XML forms when that is required.)

At another level, since the primary purpose of many Fortran progams is
numeric computation, I confess that it has never been clear to me why
one would want to use their output as input to a compiler.  However, in
the generalized form "Can you use a program in programming language L to
generate a new program in L?", it's an interesting question.  For XSLT,
the answer is clearly 'yes'. One of the most common techniques for
handling some problems in XML processing is to use an XSLT transform to
generate a new XSLT transform.  (This may be becoming less common now
with XSLT 3.0.  And perhaps because it uses a non-XML syntax, XQuery
does not seem to have developed this kind of idiom.)

> Why are there so many file formats - the XML file format, the JSON
> file format, the CSV file format, and so on?

Why do human beings have so many different ideas?

Perhaps the answer is:  because the format of such files is determined
by the programs that use them.

(By the way, I think you made a typo here: surely you meant to say "the
CSV file formats" in the plural, because there are almost as many
variants of CSV as there are programs which purport to read or write
it.)

> Isn't that contrary to the idea of formatless files?

At some level, yes.  At other levels (the operating system and character
I/O routines in C), no.  Is that a problem?

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS