Mergesort Investigation 12 - simulate recursive algorithm with explicit call stack

I wrote variants of recursive mergesort that simulate function call recurision by iteration with an explicit stack of activation records. This is an effort to try to understand the abrupt performance drops that the wikipedia bottom up algorithm exhibits at some linked list lengths.

Incidentally, these two algorithms both satisfy the Daily Coding Problem problem statement. Their simulated call stack could potentially overflow for extremely long lists, but those stacks wouldn’t have to be much bigger to accomodate those long lists.

The wikipedia bottom up algorithm reads and writes memory in patterns identical to the recursive algorithm. The bottom up algorithm still exhibits performance drops that occur when sorting linked lists with 2N+1 nodes. versus sorting linked lists with 2N nodes.

Some inobvious part of the code causes the performance drops.

One candidates for this is the array of lists the bottom up algorithm keeps. The bottom up algorithm doesn’t recurse explicitly, but it does keep an array, var array [32]*Node. During execution, any non-nil array value is the head of a list of 2i length, where i is the index in the array of that list.

Setting a nil-valued array[i] to the head of a 2i-node-long list is analogous to left := recursiveMergesort(head). Merging a non-nil-valued array[i] with a newly created 2i-node-long list is analogous to merging left and right sorted sublists. var array [32]*Node is the explicit embodiment of an implicit call stack.

It’s possible to replace the implicit recursive function call stack with an explicit stack frame data structure and an iterative algorithm. I’ve done this before, in my combinatory logic interpreter for a largish performance increase.

type stackFrame2 struct {
    formalArgument *Node
    left           *Node
    right          *Node
    leftSorted     *Node
    next           *stackFrame2
}

Above, the explicit stack frame (or activation record) struct. It has fields with these meanings:

  • A formal argument, which gets split into .left and .right elements.
  • .leftSorted which points to the sorted .left list while the .right list undergoes sorting.
  • .next points to the previous stack frame, another stackFrame2 struct, in one of the versions of an explicit stack “recursive” mergesort.

I wrote two mergesort versions by rewriting a purely recursive version that relies on implicit call stack frames.

  1. Stack frame structs allocated on the function call stack, referenced by an index into an array. This means that the explicit call stack is allocated like this: var stack [32]stackFrame2 I had hoped that the abrupt performance drops of the bottom up algorithm related to referencing an array on the stack, where the list nodes were all heap allocated. The .next pointer isn’t used by this variant.
  2. Stack frame structs allocated on the heap. This allocation looks like:
    var frames *stackFrame2

    for i := 0; i < 32; i++ {
        frame := new(stackFrame2)
        frame.next = frames
        frames = frame
    }

The .next field of such a struct points to the “calling” stack frame. The .left or .right lists in the calling stack frame are the .formalArgument element of the current stack frame.

Since I’m emulating the call stack in single-threaded code, a returnValue variable exists outsize the stack frame in both variants.

In both variants the flow of control is a big for-loop that runs while the stack of stackFrame2 structs has elements in it. That’s analogous to “recursive function hasn’t returned from top level call”. The for-loop has direct analogs to recursion bottoming out, calling another level of recursive function with left and then right halves of the formal argument list, and returning the elements of the formal argument list in sorted order. The .leftSorted element’s value, nil or non-nil, gets used to decide whether the return value from a simulated recursive call results from mergesorting .left or .right.

The differences between my “simulated stack frames on the heap” and “simulated stack frames on the real function call stack” lie in keeping track of simulated stack frames. The array variant has a ply variable that keeps track of the depth of simulated recursion. The heap-allocated-simulated-stack-frames variant has an explicit that of in-use simulated stack frames, and an explicit stack of simulated stack frames that aren’t in-use.

Despite sorting up to 17,601,000 node lists, both variants use at most 25 stack frames because the simulated recursion is depth first.

Benchmarks

Dell R530 benchmarks for simulated recursion

The purple plus-signs in the above graph should equate to blue “snowflakes” on the “Wikipedia Bottom up algorithm” graph on this page.

Qotom fanless server benchmarks for simulated recursion

The purple plus-signs in the above Qotom fanless server graph should equate to green “X” marks on the “Wikipedia Bottom up algorithm” graph on this page.

Conclusion

Wikipedia’s bottom up algorithm exhibits abrupt performance drops when sorting lists of 2N+1 nodes, relative to sorting lists of 2N nodes. A purely recursive mergesort does not exhibit these performance drops. When sorting the same list, the two algorithms would read and write the same nodes in the same order, meaning that something other than swapping around list node pointers causes the performance drops.

I hypothesized that the bottom up algorithm had these problems because it read and wrote an array allocated on the function call stack memory. By simulating function call stack recursion with an explicit stack, I was able to write an algorithm that simulated function calls with an explicit stack (A) allocated on the function call stack, and (B) allocated from the process’ heap. Neither of these two variations exhibits the abrupt performance drops. Something else about the bottom up, and my purely iterative, algorithms causes the performance drops.

Previous academic work

This effort is a bit of a throwback to Garbage Collection is Fast, but a Stack is Faster, by James S. Miller and Guillermo J. Rozas, March 1994.