Mergesort Investigation 13 - recursive algorithm variations
I tried several minor variations of recursive mergesort to try to understand the abrupt performance drops that my iterative mergesort code exhibits at some linked list lengths.
Currently, I’m convinced that memory caching is the underlying cause of performance drops that occur when sorting linked lists with 2N+1 nodes. My recursive and iterative algorithms read and write memory in very different patterns, but the wikipedia bottom up algorithm reads and writes memory in patterns quite close to the recursive algorithm.
Some inobvious part of the code causes the performance drops. I’m feeling around in the dark, but I found two small tweaks to the basic recursive algorithm to try.
- flipped - doing the right hand list first
- alternating - recursing alternating left and right hand parts of list
- counted list splitting
- list merging as a function call
Flipped recursion
My original recursive mergesort recurses like this:
func recursiveMergeSort(head *Node) *Node {
// break list pointed to by variable head into left and right lists
sortedLeft := recursiveMergeSort(left)
sortedRight := recursiveMergeSort(right)
return merge(sortedLeft, sortedRight)
}
The “flipped” version recurses on the right hand list first:
func recursiveMergeSort(head *Node) *Node {
// break list pointed to by variable head into left and right lists
sortedRight := recursiveMergeSort(right)
sortedLeft := recursiveMergeSort(left)
return merge(sortedLeft, sortedRight)
}
The first unit-length sublist it finds will be the last node of the linked list, instead of the first node of the linked list.
Benchmarking this algorithm tests whether some predictive, read ahead, caching, either software or hardware, caused the abrupt performance drops. This algorithm sorts from tail to head, instead of head to tail.
Alternating recursion
The alternating recursion version alternates between recursing using the left list first, then using the right list first. There are two, nearly identical, mutually recursive functions:
func recursiveMergeSortLeft(head *Node) *Node {
// break list pointed to by variable head into left and right lists
sortedLeft := recursiveMergeSortRight(left) // call the other recursive function
sortedRight := recursiveMergeSortRight(right)
return merge(sortedLeft, sortedRight)
}
func recursiveMergeSortRight(head *Node) *Node {
// break list pointed to by variable head into left and right lists
sortedRight := recursiveMergeSortLeft(right) // call the other recursive function
sortedLeft := recursiveMergeSortLeft(left)
return merge(sortedLeft, sortedRight)
}
The point of benchmarking this algorithm is another attempt to find out if some predictive, read ahead, caching, either software or hardware, causes the abrupt performance drops. Sorting alternating sublists could be different than sorting the list from back to front.
Counted list splitting
My original recursive algorithm did a “tortoise and hare” type algorithm to find the middle of the list. It uses two pointers, a “hare” that advanced two nodes of the list for every node the “tortoise” pointer advances. When the hare pointer hits the end of the list, the tortoise pointer holds the middle node of the list.
While researching mergesort, I read someone claiming that an algorithm that doesn’t walk the list to split it is faster than an algorithm that does. I chose to test this. Since the code knows how long the input list is, it can advance a single pointer half of that length on each ply of the recursion. This spares reading half of the list nodes on each ply.
Merging in a function call
My original recursive function has left and right list merging
in line, in the recursive function.
The bottom up algorithm has a call to a merge
function.
It’s on the edge of possibility that breaking out the list merging into a function
causes the performance drops.
This algorithm tests that idea.
Dell R530 benchmarking
The purple + signs on this graph (“Dell”) correspond to the blue snowflakes/lines (“Dell R530 #1”) and golden hollow squares/lines (“Dell R530 #2”) on this graph.
Qotom fanless server benchmarking
The purple + signs on this graph (“Dell”) correspond to the purple plus signs/lines (“Qotom #1”) and blue X/lines (“Qotom #2”) on this graph.
Conclusion
None of these variations exhibits the performance drops. None of these little tweeks indicate reason for the problem.