Mergesort Investigation 12 - simulate recursive algorithm with explicit call stack
I wrote variants of recursive mergesort that simulate function call recurision by iteration with an explicit stack of activation records. This is an effort to try to understand the abrupt performance drops that the wikipedia bottom up algorithm exhibits at some linked list lengths.
Incidentally, these two algorithms both satisfy the Daily Coding Problem problem statement. Their simulated call stack could potentially overflow for extremely long lists, but those stacks wouldn’t have to be much bigger to accomodate those long lists.
The wikipedia bottom up algorithm reads and writes memory in patterns identical to the recursive algorithm. The bottom up algorithm still exhibits performance drops that occur when sorting linked lists with 2N+1 nodes. versus sorting linked lists with 2N nodes.
Some inobvious part of the code causes the performance drops.
One candidates for this is the array of lists the bottom up algorithm keeps.
The bottom up algorithm doesn’t recurse explicitly,
but it does keep an array, var array [32]*Node
.
During execution, any non-nil array
value is the head of a list
of 2i length, where i
is the index in the array of that list.
Setting a nil-valued array[i]
to the head of a 2i-node-long list
is analogous to left := recursiveMergesort(head)
.
Merging a non-nil-valued array[i]
with a newly created 2i-node-long list
is analogous to merging left and right sorted sublists.
var array [32]*Node
is the explicit embodiment of an implicit call stack.
It’s possible to replace the implicit recursive function call stack with an explicit stack frame data structure and an iterative algorithm. I’ve done this before, in my combinatory logic interpreter for a largish performance increase.
type stackFrame2 struct {
formalArgument *Node
left *Node
right *Node
leftSorted *Node
next *stackFrame2
}
Above, the explicit stack frame (or activation record) struct. It has fields with these meanings:
- A formal argument, which gets split into
.left
and.right
elements. .leftSorted
which points to the sorted.left
list while the.right
list undergoes sorting..next
points to the previous stack frame, anotherstackFrame2
struct, in one of the versions of an explicit stack “recursive” mergesort.
I wrote two mergesort versions by rewriting a purely recursive version that relies on implicit call stack frames.
- Stack frame structs allocated on the function call stack,
referenced by an index into an array.
This means that the explicit call stack is allocated like this:
var stack [32]stackFrame2
I had hoped that the abrupt performance drops of the bottom up algorithm related to referencing an array on the stack, where the list nodes were all heap allocated. The.next
pointer isn’t used by this variant. - Stack frame structs allocated on the heap. This allocation looks like:
var frames *stackFrame2
for i := 0; i < 32; i++ {
frame := new(stackFrame2)
frame.next = frames
frames = frame
}
The .next
field of such a struct points to the “calling” stack frame.
The .left
or .right
lists in the calling stack frame
are the .formalArgument
element of the current stack frame.
Since I’m emulating the call stack in single-threaded code, a returnValue
variable
exists outsize the stack frame in both variants.
In both variants the flow of control is a big for-loop that
runs while the stack of stackFrame2
structs has elements in it.
That’s analogous to “recursive function hasn’t returned from top level call”.
The for-loop has direct analogs to recursion bottoming out,
calling another level of recursive function
with left and then right halves of the formal argument list,
and returning the elements of the formal argument list in sorted order.
The .leftSorted
element’s value, nil or non-nil, gets used to decide
whether the return value from a simulated recursive call results from
mergesorting .left
or .right
.
The differences between my “simulated stack frames on the heap”
and “simulated stack frames on the real function call stack”
lie in keeping track of simulated stack frames.
The array variant has a ply
variable that keeps track of the depth
of simulated recursion.
The heap-allocated-simulated-stack-frames variant has an explicit
that of in-use simulated stack frames,
and an explicit stack of simulated stack frames that aren’t in-use.
Despite sorting up to 17,601,000 node lists, both variants use at most 25 stack frames because the simulated recursion is depth first.
Benchmarks
The purple plus-signs in the above graph should equate to blue “snowflakes” on the “Wikipedia Bottom up algorithm” graph on this page.
The purple plus-signs in the above Qotom fanless server graph should equate to green “X” marks on the “Wikipedia Bottom up algorithm” graph on this page.
Conclusion
Wikipedia’s bottom up algorithm exhibits abrupt performance drops when sorting lists of 2N+1 nodes, relative to sorting lists of 2N nodes. A purely recursive mergesort does not exhibit these performance drops. When sorting the same list, the two algorithms would read and write the same nodes in the same order, meaning that something other than swapping around list node pointers causes the performance drops.
I hypothesized that the bottom up algorithm had these problems because it read and wrote an array allocated on the function call stack memory. By simulating function call stack recursion with an explicit stack, I was able to write an algorithm that simulated function calls with an explicit stack (A) allocated on the function call stack, and (B) allocated from the process’ heap. Neither of these two variations exhibits the abrupt performance drops. Something else about the bottom up, and my purely iterative, algorithms causes the performance drops.
Previous academic work
This effort is a bit of a throwback to Garbage Collection is Fast, but a Stack is Faster, by James S. Miller and Guillermo J. Rozas, March 1994.