OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Four fine text-based data formats ... liberateyourself from one (silo) data format

On Sun, 2013-03-24 at 12:54 +0000, Costello, Roger L. wrote:
> Hi Folks,
> Here are four fine text-based data formats. There are all well supported. 
> 1. XML: obviously you know about this data format and its support.
> 2. JSON: data that is in this format can be readily queried and
> manipulated in a JavaScript program, and support for JavaScript is
> growing at a breathtaking rate. From Simon St. Laurent: There are also
> piles of public APIs using JSON.  Programmable Web and similar places
> keep showing growth in JSON-based APIs.  See, for example:
> http://blog.programmableweb.com/2012/12/17/leading-apis-say-bye-xml-in-new-versions/ 
> 3. CSV: data in the form of comma-separated-values (CSV) can be
> readily queried and manipulated in Excel. There are many tools that
> support CSV, here's one from Google:
> http://code.google.com/p/csvfix/ 
> 4. Plain text: of all the data formats, this one is by far the most
> widely supported. Every computer on the planet has at least one text
> editor (probably several). There are many, many powerful tools, such
> as vi and emacs, that can readily query and manipulate plain text
> files. 

What do you mean by "plain text"? XML, JSON, and CSV are all plain
text. Plain text isn't a syntax. It's an assertion that the file
doesn't contain 0x0.

> Shouldn't we define standards - using a particular data format - for
> data exchanges? No! Define standards at the semantic level, not the
> syntax level. Let everyone use their own syntax.

There are times when defining the semantics in a syntax-neutral way
is a good idea. Dublin Core does this. As a result, it gets used in
tons of formats in different syntaxes.

But if everybody gets to use their own syntax, we will never have
interoperability, even if the same semantics are encoded in there.
Somebody has to write the code to parse the syntax and extract the
semantic information.

Also, syntax absolutely informs the semantics. For example, you can't
nest things in CSV or INI files. The best you can do is define some
record as a pointer to another. That's not a standard feature of the
syntax though, so there's more code you get to write.

As with all things, there are trade-offs, and you can't just paint it
with a broad brush of "define semantics, not syntax". If you're dealing
with simple key-value information, maybe it's a good idea. If you want
your semantics embedded in existing host languages, maybe it's a good
idea. But if you want two machines to talk to each other and actually
get something done, you need to define syntax.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS