Re: Avoid Flowcharts, Base Program Structures on Data Structures

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Sean McGrath <sean.mcgrath@propylon.com>
To: xml-dev@lists.xml.org
Date: Mon, 21 Oct 2013 09:42:43 -0500

On 10/19/2013 11:43 PM, xml-dev-digest-help@lists.xml.org wrote:

“program structures should be based on data structures”

When you design a program, how do you base your program structures on the structures in the XML input?

Roger,

I'll paraphrase a lot of Jackson-speak here and borrow some phrases too, from books I read on the subject over the years. (I remember learning a lot from a book on Jackson by Lief Ingervarldson http://amzn.to/180FHmE, as well as Jacksons books.) I'll also include some of my own stuff but all my bits are just refinements or natural evolutions of the core Jackson ideas.

Historical note: At SGML '96 (17 years ago, oh my!), I gave a talk on the relationship between the JSP methodology and SGML processing. It was called "What do SGML and Michael Jacskon have in common?") Not available online I don't think.

My take on the key Jackson ideas (both JSP and JSD):

- When all is said and done, *all* software boils down to the transformation of some form of digitally encoded input stream into some form of digitally encoded output stream. Files are an obvious source of input/output but sensors, VDUs etc. are all valid streams.

- Sometimes you know *nothing* about the structure of the input stream other than some concept of "record". e.g. it might be an iteration of bits, an iteration of bytes, an iteration of lines, an iteration of TCP/IP packets... Sometimes you know a lot more about the structure of the input. e.g. N customer records each of which has name, address....

- Regardless of how much, or how little, you know about the inputs, you can *model* them using a combination of simple, hierarchical operators : sequence, selection, iteration. Think regexp, think context-free-grammars, think XML schema.

- You know a lot about the required output data structure. If you don't, you have no business coding anything :-) You can model that output using the same set of simple hierarchical operators

- You also know that the the two data strucutures you end up with are different. If they weren't the solution to your problem would be the copy command:-)

- The differences between the data structures fall into a number of categories known as "structure clashes". Each type of structure clash has its own solution pattern.

- For most non-trivial problems the structure clash between the input and output data structures is big. However, it is always possible to de-compose the large structure clash into a series of smaller structure clashes. You can keep doing this de-composition until you get to the atomic structure clashes that have standard JSP patterns associated with them. (Documented in the books: order clash, interleave clash etc.)

- I call the above analysis "transformation decomposition". Once this is done you end up with:
    - an input data structure
    - an output data structure
    - N "deltas" each of which has:
        : an input data structure
        : an output data structure
        : a program
Write all the mini-programs. Chain all these together in a pipeline and you are in business

- If this all works but is too slow, or too memory intensive...use inversion (Jackson style co-routines) to turn your pipeline into a single, resource efficient, executable.

- Alternatively, given that its no longer 1970, leverage the fact that each program delta above is a separate, standalone process that can be run on its own core or own machine in a cloudy cluster

- It is recommended that you use a standard syntax for your input/output data structures as it makes re-use, debugging etc. a lot easier. E.g. CSV, JSON, XML...(there are times when each of these is the pragmatic answer IMO)

- Don't get hung up on the fact that your initial inputs and final outputs might not be CSV, JSON, XML....With the help of some simple "on/off ramps" (my terminology), you can get all the development advantages of Jackson for the real action (the guts of your application) and still handle things like sensors,

- It turn out that business processes can usefully be conceptualized as long-running input/output streams. Once you start thinking about business processes that way, all the above techniques apply at the level of complete application suites/systems as well as at the level of discrete programs

- No, it isn't a silver bullet. You probably could, use it for absolutely everything but it doesn't make sense to do so, in my experience. The sweet spots for JSP/JSD are (a) in the core algorithmics of a data-heavy problem and (b) overall system design. Problems not well suited to it include much pure math, UI design, inference engines, that sort of thing

Hope this helps,
Sean

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]