I’m going to beat around the bush for a while, but there is a point.
###Semigroups
The answer is, *the associative property of the binary reduction operation*.
That’s pretty abstract, but multiplication is a good example. If *x*, *y* and *z* are some natural numbers (or integers, or rational numbers, or real numbers, or complex numbers, or *N*×*N* matrices, or any of a whole bunch more things), then *x*×*y* is the same kind of number as both *x* and *y*. We started with two numbers, so it’s a binary operation, and got one, so we reduced the count of numbers we had by one, making this a reduction operation. And (*x*×*y*)×*z* is always the same as *x*×(*y*×*z*), which is the associative property.
(If you already know all this, you can skip to the next section.)
A few more things you often see in computer science that work the same way:
- adding any of those kinds of numbers instead of multiplying
- concatenating strings (`"a"+"b"+"c"` is `"abc"` whether you start with `"ab"+"c"` or `"a"+"bc"`)
- Splicing two lists together. `[a]++[b]++[c]` is similarly `[a,b,c]` either from back to front or front to back.
- `cons` on a head and tail, if you think of the head as a singleton list. That’s just concatenating two lists.
- taking the union or the intersection of sets
- Boolean and, Boolean or
- bitwise `&`, `|` and `^`
- composition of functions: (*f*∘*g*)∘*h* *x* = *f*∘(*g*∘*h*) *x* = *f*(*g*(*h*(*x*)))
- maximum and minimum
- addition modulo *p*
Some things that don’t:
- subtraction, because 1-(1-2) ≠ (1-1)-2
- *f*(*x*,*y*) = tan(*x*+*y*), because tan(π/4 + π/4) is undefined
- multiplication over the negative numbers, because -1 × -1 is not a negative number
- division of integers, which has all three problems!
- logical not, because it has only one operand, not two
- `int print2(int x, int y) { return printf( "%d %d\n", x, y ); }`, as `print2( print2(x,y), z );` and `print2( x, print2(y,z) );` have different output.
It’s a useful enough concept that we named it. A set with an operation that has these properties is a *semigroup*. So, the real numbers under multiplication are a semigroup. And your question turns out to be one of the ways this kind of abstraction becomes useful in the real world. **Semigroup operations can all be optimized the way you’re asking about.**
### Try This At Home
As far as I know, this technique was first described in 1974, in Daniel Friedman and David Wise’s paper, [“Folding Stylized Recursions into Iterations”][1], although they assumed a few more properties than it turns out they needed to.
Haskell is a great language to illustrate this in, because it has the `Semigroup` typeclass in its standard library. It calls the operation of a generic `Semigroup` the operator `<>`. Since lists and strings are instances of `Semigroup`, their instances define `<>` as the concatenation operator `++`, for example. And with the right import, `[a] <> [b]` is an alias for `[a] ++ [b]`, which is `[a,b]`.
But, what about numbers? We just saw that numeric types are semigroups under *either* addition *or* multiplication! So which one gets to be `<>` for a `Double`? Well, either one! Haskell defines the types `Product Double`, `where (<>) = (*)` (that is the actual definition in Haskell), and also `Sum Double`, `where (<>) = (+)`. One wrinkle is that you used the fact that 1 is the multiplicative identity. A semigroup with an identity is called a *monoid* and is defined in the Haskell package `Data.Monoid`, which calls the generic identity element of a typeclass `mempty`. (No relation to *monads*, so just forget I even brought those up.)
That’s enough information to translate your algorithm into a Haskell function using monoids:
module StylizedRec (pow) where
import Data.Monoid as DM
pow :: Monoid a => a -> Word -> a
{- Applies the semigroup operation of the type of x, whatever that is, by
- itself n times. This is already in Haskell as Data.Monoid.mtimes, but
- let’s write it out as an example.
-}
pow _ 0 = mempty -- Special case: Return the nullary product.
pow x 1 = x -- The base case.
pow x n = x <> (pow x (n-1)) -- The recursive case.
Importantly, note that this is tail recursion modulo semigroup: every case is either a value, a tail-recursive call, or the semigroup product of both. Also, this example happened to use `mempty` for one of the cases, but if we hadn’t needed that, we could have done it with the more general typeclass `Semigroup`.
Let’s load this program up in GHCI and see how it works:
*StylizedRec> getProduct $ pow 2 4
16
*StylizedRec> getProduct $ pow 7 2
49
Remember how we declared `pow` for a generic `Semigroup`, whose type we called `a`? So the interpreter needs to deduce the type of `a`. The `getProduct` function is what turns an object of the `Product t` typeclass back into a number of type `t`. So the return value of `pow` is some kind of `Product` over a numeric type. The first argument of `pow` has the same type, and `2` is an `Integer` in this context, so GHCI deduces that the type `a` here is `Product Integer`, which is an `instance` of `Monoid` whose `<>` operation is `*`. So `pow 2 4` expands recursively to `2<>2<>2<>2`, which is `2*2*2*2` or `16`. So far, so good.
But our function uses only generic monoid operations. Previously, I said that there is another instance of `Monoid` called `Sum`, whose `<>` operation is `+`. Can we try that?
*StylizedRec> getSum $ pow 2 4
8
*StylizedRec> getSum $ pow 7 2
14
The same expansion now gives us `2+2+2+2` instead of `2*2*2*2`. Multiplication is to addition as exponentiation is to multiplication!
But I gave one other example of a Haskell monoid: lists, whose operation is concatenation.
*StylizedRec> pow [2] 4
[2,2,2,2]
*StylizedRec> pow [7] 2
[7,7]
Writing `[2]` tells the compiler that this is a list, `<>` on lists is `++`, so `2++2++2++2` is `[2,2,2,2]`.
### Finally, an Algorithm (Two, in Fact)
By simply replacing `x` with `[x]`, you convert the generic algorithm that uses recursion modulo a semigroup or monoid into one that creates a list. Which list? *The list of elements the algorithm applies `<>` to.* Because we used only semigroup (or monoid) operations that lists have too, the resulting list will be isomorphic to the original computation. And since the original operation was associative, we can equally well evaluate the elements from back to front or from front to back.
If your algorithm ever reaches a base case and terminates, the list will be non-empty. Since the terminal case returned something, that will be the final element of the list, so it will have at least one element.
How do you apply a binary reduction operation to every element of a list in order? That’s right, a fold. So you can substitute `[x]` for `x`, get a list of elements to reduce with `<>`, and then either right-fold or left-fold the list:
*StylizedRec> getProduct $ foldr1 (<>) $ pow [Product 2] 4
16
*StylizedRec> import Data.List
*StylizedRec Data.List> getProduct $ foldl1' (<>) $ pow [Product 2] 4
16
The version with `foldr1` actually exists in the standard library, as `sconcat` for `Semigroup` and `mconcat` for `Monoid`. It does a lazy right fold on the list. That is, it expands `[Product 2,Product 2,Product 2,Product 2]` to `2<>(2<>(2<>(2)))`. Since the fold is lazy, the program at runtime will just keep a thunk until you ask for the first value, then it’ll calculate as little as it can and put in a thunk for the rest. So, `2 <>` *the rest*. When it needs the rest, it’ll calculate the next part and get `2 <>` *the next rest*, and so on.
This is not useful in this case because you can’t do anything with the partial results, but it is extremely useful if you’re generating an eagerly-evaluated data structure and you want to use the partial results. Like if you’re consuming interactive input and you want to produce interactive output based on what you’ve read in so far. Immediately, on as little input as you can at a time, without waiting for any more input than you need. In the best-case scenario, you’re consuming each result as soon as it’s generated, so your algorithm runs in constant memory. It only ever needs to generate one result and one thunk at a time. Then, the result is immediately consumed and thrown away as a temporary, and the thunk is replaced by the next result and the next thunk. A right-fold also works on infinite lists. On the other hand, if your results are not eagerly consumed, you could recurse a lot of times into the stack or use up a lot of memory.
The version with `foldl1'` is a strictly-evaluated left fold. That is to say, a tail-recursive function with a strict accumulator. This evaluates to `(((2)<>2)<>2)<>2` and strict means Haskell calculates the whole thing right away, no thunks. So, it calculates `(4<>2)<>2`, then immediately calculates`8<>2`, then `16`. This is why we needed the operation to be associative: we just changed the grouping of the parentheses!
**The strict left fold is the equivalent of what GCC is doing.** The leftmost number in the previous example is the accumulator, in this case a running product. At each step, it’s multiplied by the next number in the list. Another way to express it that is: you iterate over the values to be multiplied, keeping the running product in an accumulator, and on each iteration, you multiply the accumulator by the next value. That is, it’s a `while` loop in disguise.
It can sometimes be made just as efficient. The compiler might be able to optimize away the list data structure itself. In theory, it has enough information at compile time to figure out it should do so here: `[x]` is a singleton, so `[x]<>xs` is the same as `cons x xs`. Because the call is tail recursive and the accumulator is evaluated strictly, each iteration of the function can just re-use the same stack frame and update the parameters in place. This will not, however, return anything until it finishes. It might generate a huge data structure that would’ve been consumed eagerly by a right fold. It might generate a lot of wasteful intermediate results—if you try a strict left fold that adds elements one at a time to the *end* of an immutable linked list instead of the beginning, for example, you’ll end up making a new copy of the entire list at every step. If you left-fold on the program’s input, the program will run in batch mode where a lazy right fold would’ve run interactively. And, if called on an infinite list, it will never return.
Know which of the two you want to use in any given case. Or maybe try them and see what happens.
So, as you can see, it is possible to automatically optimize tail-recursion modulo any semigroup (one example of which is any of the usual numeric types under multiplication) to either a lazy right fold or a strict left fold, in one line of Haskell.
### Generalizing Further
The two arguments of the binary operation don’t have to be the same type, so long as one of them is the same type as your result. (You can of course always flip the arguments to match the order of the kind of fold you’re doing, left or right.) So you might repeatedly add patches to a file to get an updated file, or starting with an initial value of 1.0, divide by integers to accumulate a floating-point result. Or prepend elements to the empty list to get a list.
[1]: https://www.cs.indiana.edu/ftp/techreports/TR19.pdf