[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
The impact of data format selection on application development
- From: Roger L Costello <costello@mitre.org>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Sun, 10 Jul 2022 12:22:25 +0000
Hi Folks,
Recently I have been reading a wonderful book titled "Little Languages and Tools". Its authors are Jon Bentley, Brian Kernighan, Paul Hudak, and others. The book shows how programs written in little languages such as AWK, Lex, Yacc, pic (picture language), scatter (scatter plot language), troff, sed, can be independently developed and assembled via pipes
scatter infile | pic | troff >outfile
Little languages provide a powerful way to quickly implement robust tools.
Reading the book made me keenly aware of one thing: The XML data format is complex! Compare the densely written 36-page XML specification (plus the 16-page namespace specification) to this three-sentence specification of a data format:
The data format consists of lines. Each line contains fields. Fields are separated by a delimiter (space, tab, comma, etc.).
You might argue that such a data format is too simple to be useful. Not so! Much data may be expressed using that data format: a list of data about persons (name, age, gender). A list of data about aircraft in the Boeing inventory (model, weight, wingspan, max speed). A list of data about wild flora in the Amazon rainforest (species, size, lethality). A list of data about books (title, author, publisher). The types of data amenable to that data format is virtually endless. For data items that aren't in that format, there are tools available for putting them into the format.
Simple data formats often spawn the development of powerful little tools. AWK is one such tool. With a line or two of AWK code you can quickly implement powerful data filters for transforming data in the above data format.
What is the role of XML as a data format? What is the role of very simple data formats such as the one above? The answer is not clear-cut in my mind. What does seem clear, however, is that choosing the right data format can have a significant impact on application development - on the ease of development, on the cognitive load it incurs on the developer and maintainer, on the ability to create independent tools that can be assembled in a pipeline.
"The important thing, as always, is to find a way of looking at the input data that makes it easy to lay out the program." ["Software Tools" by Brian Kernighan, p. 42]
Comments?
/Roger
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]