Re: Push and pull (was: Re: [xml-dev] Basic program structures inXSLT (w

On Sat, Dec 10, 2022 at 5:18 AM C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:

Rick Jelliffe <rjelliffe@allette.com.au> writes:

> On Fri, 9 Dec 2022, 7:06 pm Dave Pawson, <dave.pawson@gmail.com> wrote:
>
> Yes, but Rick....
>
>> On Fri, 9 Dec 2022 at 07:01, Rick Jelliffe <rjelliffe@allette.com.au> wrote:
>> >
>> > Most basic?
>> >
>> > 1. Push versus pull.
>>
>> Not enough? Why? explain these please? What's the impact of wrong choice?
>> When to 'break' this rule (guidance)

> Pull programming is where you pull in values into a template. Push
> programming is where the structure of the incoming document determines
> the templates that are run.

Thank you, Rick, for your list. To a large extent it agrees with the
one I came up with when trying to teach an introduction to XSLT, and I
wish I could see a way to boil it down further. But it's helpful,
particularly where it's not quite the same as my list. (No disrespect
intended to the other responses.)

What you and Dave Pawson say about push and pull provoked some thought,
which I share for what it's worth.

I first encountered the terms push and pull in connection with language
design for tree transformation langugaes, where they describe how
execution progresses: in a pull language, we visit each node of the
output tree exactly once in depth-first sequence and may visit nodes in
the input out of sequence and maybe repeatedly, while in a push language
we visit each node in the input exactly once, in sequence, but may
produce output out of sequence. In that sense, the DSSSL tree-to-tree
transformation langauge is (if I understand correctly) a push language,
while XSLT is a pull language.

So I was puzzled when people talked to me about push and pull in XSLT.

After Mike Kay's reminder about Jackson and his analysis of the various
ways the structure of the input and output can differ, it occurs to me
that what we call push and pull style in XSLT correspond* very nicely to
the cases where the structure of the input and output agree* and
clash*. (Asterisks mean "so to speak" and mark terms that need some
discussion and qualification.)

I am oversimplifying somewhat here, but I think there is something here
worth understanding.

In a very simple, very pure push-style transformation, where every
template has a form like

<xsl:template match="GI">
...
<xsl:apply-templates/>
...
</

the structure of input and output agree. They are not necessarily
identical, and we may filter some nodes out (by omitting the call to
apply-templates) and inject some nodes (in the ...), but for any two
nodes in the input which map to specific nodes in the output, the
corresponding output nodes are in the same order as in the input. And
from the point of view of tree traversal, any evaluation of the
transformation will visit the nodes of the input tree in order, and will
also visit the nodes of the output tree in order.

Push style becomes necessary **when there is some difference or clash
between the input and output structure**, whether it's re-ordering or
duplication or inversion of containment relations. And its
characteristic is not just that we visit the output nodes in order
without backtracking (we always do, for any transform that maps one tree
to one tree), but that we do not visit the input nodes in order without
backtracking. To generate a toc in XSLT, we normally visit the chapter
nodes twice, once in TOC mode and once in default mode. So we
backtrack. If we are reordering and restructuring things, we will
normally end up visiting input nodes out of order, and often
repeatedly.

As I said, this is a bit of a simplified view. There are a number of
complications.

- Thes pattern is probably clearest in conventional XSLT 1.0
transformations in a straightforward implementation. Sophisticated
implementations and some features in 2.0 and 3.0 complicate life.

But like Rick, I tend to think of XSLT 1.0 as the core of the
language; I sometimes worry that that means I have not fully grokked
3.0.

- Since XSLT is declarative, there is no rule that says what order a
processor has to do things in, and a sufficiently determined
processor can do things in any order. So any talk about what order
nodes are visited in is just talk about what a simple implementation
doing the obvious thing might do. Anyone who has studied Jackson
will know about turning programs inside out and doing things out of
the obvious order, and anyone who has stepped through an XSLT
transform in a debugger will have learned that at least one XSLT
implementation in current use does that a good deal.

- Multiple inputs mean that there is not a single ordering over all
input nodes; ditto for multiple outputs (although I believe that a
straightforward implementation will visit each node in each output
tree in order, even if not necessarily one tree after the other).

- Being able to assign nodes to variables and save them means that we
can, if we like, generate an output node at one point in the
transformation, store it in a variable, and then write it out at
another time; that can make hash of any story about the order in
which nodes are visited or created.

Reducing it to a bumper sticker, I think the generalization is:

Push style when input and output have the same structure, pull style
when the structures differ.

Dave P asks what happens when you choose wrong? If you push when you
need to pull, the output won't have the structure you wanted. If you
pull when you could push, you may often find yourself dropping
information from the input that you didn't mean to drop, or
restructuring things that should not be restructured.

Thank you, MK and Rick and Dave, for sparking this thought.

Michael

--
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com