[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] The impact of data format selection on application development
- From: Norman Gray <norman.gray@glasgow.ac.uk>
- To: Roger L Costello <costello@mitre.org>
- Date: Tue, 12 Jul 2022 13:29:33 +0100
Roger, hello.
On 12 Jul 2022, at 12:51, Roger L Costello wrote:
> Missouri River 2,341
> Mississippi River 2,340
> Yukon River 1,979
> Rio Grande 1,759
>
> When I provide that data (file) to someone I will inform them:
>
> Hey, the file consists of the lengths of rivers in the U.S. Each line of the file contains two fields: the U.S. name of a river and its length. The fields are separated by a tab. The length is expressed in miles as an integer and groups of digits are separated by the comma symbol (such as 1,759).
A better description would be 'a TSV file with river name in column 1 and integer length in km in column 2'. That's 'simpler' because it's even shorter than yours, but remains clear, to certain people, what's required to process it.
The word 'simpler' is in quotes, there, because as has already been discussed in this thread, there's significantly more to CSV or TSV than meets the eye (escapes, line-endings, and so on), so this is 'simpler' only for a recipient who has seen this before and knows what to do. _In that context_, the data description is short, and appears simple.
So 'simple' data formats are actually 'high-context' data formats (compare [1]).
Note that in that description I didn't mention that I'd expect the integer _not_ to include a comma (which is useful only for display, and which would conventionally be regarded as hostile in a transmission format), and I did choose to add a little explicit context in mentioning the units of the second column (you _did_ mean km, didn't you,... hmm?). So I've thoughtfully chosen what context to make explicit, and expected that the recipient of the description will know the Right Thing To Do.
So what JSON or XML would be doing, in the alternative choice of file format, would be providing explicit context in a different conventional way.
(It also occurs to me that, by saying 'TSV', the explanation above is also arguably _usefully opaque_ to someone who doesn't have the context I'm guessing they have, so if they get back to me, puzzled, I can tell that about them and advise differently.)
> Without that explanation, the file (data) is useless. But that holds true for an XML file containing the same data and a JSON file containing the same data. One might argue that with XML the tags describe the data, so an accompanying explanation is not needed. But relying on XML tags to explain data is folly (e.g., what if the developer uses generic tags such as <li>, such tags hardly "explain" the data). I would argue, regardless of the data format, there needs to be some accompanying explanation about the data. And if that's the case, then heck, use the simplest possible data format (use the super-simple data format shown above) and take advantage of the plethora of tools available for processing super-simple data formats.
'Simple' formats are great. If all you were sending me was a list of names and lengths, then I'd thank you for sending me something as simple as the above, because I'm confident I could easily turn it into whatever I wanted.
But -- and I think this is the key point -- simple formats run out of steam really quickly, and if requirements change, then the simple format, hackily extended in this direction and that, will start to look more hellish, faster, than any of the more sophisticated formats. Or: simplicity is sometimes brittle.
Thus there's a matter of technological taste, and judgement here.
Best wishes,
Norman
[1] https://en.wikipedia.org/wiki/High-context_and_low-context_cultures
--
Norman Gray : https://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]