XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Is XML a language or a data format?

Roger L Costello <costello@mitre.org> writes:

> Yesterday a colleague made a fascinating distinction between
> “language” and “data format”:

> * First he noted that English is a language, not a data
>   format. Likewise, Java is a language, not a data format.

> * A language is specified by a grammar. There is a grammar for English
>   and a grammar for Java. A language is intended to read by humans.

> * A data format probably does not have a grammar. It oftentimes is
>   simply a collection of pieces and parts. It is intended to be
>   processed by a machine. An example of a data format is JPEG
>   (Exif). There is no grammar for it. It is just a series of parts
>   pieced together as the graphic below illustrates.

As Michael Kay has already pointed out, if a data format can be read and
processed by machine, then it has a reliably recognizable structure; the
chances are very good that that structure can be described by a
context-free grammar that describes the set of possible instances of the
data format somewhere more closely than the context-free grammars of
Java and other programming languages match the set of conforming
programs.  (That is, many data formats are in fact context-free;
programming languages with type systems are not context-free.)

If "there is no grammar for it" means "it is not possible to write a
grammar for it", then the claim is false for every data format I can
think of off the bat (including JPEG and Exif).

If "there is no grammar for it" means "the people who defined the data
format did not bother to write down a formal grammar for it because they
couldn't be bothered and formal grammars are for quiche-easters", then
it's a sociological statement about the mentality of the data format
specifier.  (And whenever I encounter a data format designed by someone
who believes that data formats don't have grammars, I do my level best
to give it and them a wide berth, since I don't need more aggravation in
my life.)

[If any readers of xml-dev are mystified by the reference to quiche, a
search for "Real programmers don't eat quiche" will provide relevant
context for the mindset I am attributing to the unnamed data format
designers here.]

> Do you agree with that distinction? How do you define language? How do
> you define data format? How do they differ?

For technical purposes, the most useful definition of "language" is as a
set of sequences of symbols.  Some languages can be defined by
grammars, others can by usefully approximated by grammars.

For most purposes, I'd say a "data format" is a form in which data can
usefully be stored on persistent media or exchanged with others.  As a
rule, a data format worth using has regularities which can be captured
with a grammar, and by and large the provision of a formal grammar
describing a format is a plausible sign that reasonable care and thought
have gone into the design and specification of the format.  There may be
exceptions, but the most prominent exceptions I can think of are cases
of proprietary formats where an explicit grammar would allow access to
the data by undesirable people (i.e. those who are not paying royalties
ot the owner of the data format).

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS