# Mergesort Investigation 1 - The Problem

One coding problem I encountered was to do a merge sort of a linked list. I wrote a merge sort in Go, but when I tried to verify it via timing sorting large linked lists, I got some strange behavior

Here’s the problem statement:

#### Daily Coding Problem: Problem #930 [Medium]

This problem was asked by Google.

Given a linked list, sort it in O(n log n) time and constant space.

For example,
the linked list
`4 -> 1 -> -3 -> 99`

should become
`-3 -> 1 -> 4 -> 99`

.

A linked list is a computer data structure, an organization of data that allows optimal searching, categorizing, or storage of information. Linked lists are a series of elements, one element is the head of the list, and all elements except the tail element have a “pointer”, information to find the next node, which references the next node in the sequence.

“Sort it in O(n log n) time” is an exhortation to not write something time consuming, like bubble sort, or insertion sort. Merge sort, heap sort, quick sort, all have O(n log n) time complexity.

The “constant space” stipulation is a little odd. Sort algorithms are rated in Big-O notation on the basis of time complexity, which often comes down to counting how many numerical comparisons get made during a sort. Other considerations, like auxiliary space, are given only minor attention, if any at all. An unusual requirement or stipulation in one of the programming problems can sometimes lead to greater enlightenment. I took “constant space” as a means to that enlightenment.

#### What puzzles me

Above, the results of my benchmarking the same code on two operating systems, and three computers, circa July, 2021. Why do the timings exhibit jumps at 2 million nodes and 4 million nodes?

I’ve now got a Qotom fanless server, and a Apple M2 silicon Macbook in addition to the Dell R530 PowerEdge server. I’m going to give the benchmarking another try. Maybe the M2 silicon Macbook, possessed of an ARM CPU, will help me out here.

It’s remarkably easy and consistent to recreate the timing jumps. Here’s two runs on each of the three computers:

The above graph has more values along the X-axis than my July 2021 graph. Not only are there timing jumps at 2 million and 4 million list nodes, there are also jumps at 8 million and 16 million nodes. I ran the benchmark program twice on each machine to show that the abrupt performance is easily, and precisely repeatable.

I wondered exactly where the timing jumps happened, and how abrupt they were.

That’s timings of two of my computers, from 8210000-element linked lists, to 8409000-elements, with an increment of 1000 nodes per timing.

The graph above demonstrates that the jumps are dramatic. The Qotom server takes 11.8 seconds on average to sort a 8388000-element linked list, and 14.9 seconds on average to sort a 8389000-element linked list. The Dell R530 jumps from 5.8 seconds for 8388000-element lists to 7.5 seconds for a 8389000-element list. The slowdown occurs at exactly or nearly exactly the same length lists on both machines.

That’s a series of ever smaller list-length increments benchmark runs
on the Qotom fanless server.
The purple crosses on this graph are from the same benchmark as the purple crosses
on the graph above,
demonstrating again that benchmark runs are consistent.
There’s 1000-node increments, 250-node increments, and finally 10-node increments.
An 8388600-node list sorts in 11.7 seconds, but a 8388610-node list sorts in 14.9 seconds.
It’s wild that the performance drops so much over a 10-node increment.
The performance drops at around 2^{23} nodes.

Above, further, finer increment benchmarking
to find a short-list-length break point on my ol’ Dell R530.
The bottom red point is for lists of length 4194304,
which is 2^{22}.
The upper red point is for lists of length 4194305, 2^{22}+1.
The addition of a single extra node to the list causes the performance drop.
Presumably, a single extra node causes the observable performance drops
at 2^{21} and 2^{24} list lengths.

I did a lot of futzing around trying to figure out why the performance
drops at 4194305, 2^{22}+1,
and at
almost exactly 2^{23} (8388608) nodes.