Mergesort Investigation 15 - more count of comparisons
What happens if you compare the number of comparisons made by recursive and wikipedia bottom up algorithms if the initial list is pre-sorted (low data value to high data value) or reverse sorted (high data value to low).
Both sorting a pre-sorted initial list, and sorting a reverse sorted initial list appeared to not have the abrupt performance drops that sorting an initial list with randomly-chosen data values.
I think there are two different explanations for these seemingly identical phenomenon.
Already Sorted Initial List
The bottom up algorithm makes a lot less comparisons when sorting pre-sorted or reverse sorted lists, as shown above. The pre-sorted lists exhibit the same abrupt increase in count of comparisons made at 2N+1 length lists, and select other list lengths, for the same reason: the last node in the list, the “+1” node, doesn’t get merged in any bigger list until the final “roll up” phase of the bottom up algorithm.
Above, the ratio of comparison counts for the bottom up algorithm, count of comparisons for randomly-chosen data over count of comparisons for already sorted data. The algorithm sorting randomly-chosen data values makes at least 1.72 times as many comparisons as for sorting already-sorted-data.
The additional comparisons don’t cause visible performance drops. This is interesting, and I don’t have a good explanation other than “there’s not enough of them”.
Reverse Sorted Initial List
For both algorithms, there’s a lot fewer comparisons when sorting a reversed initial list than when sorting a randomly-chosen data value list. This happens because when two lists are merged, the right sublist always has data values less than the left sublist. Merging the two consumes the right sublist without using a single node from the left sublist, which is appended without further comparison to the merged sublist. The left sublist is appended as a whole to the merged list.
The pesky final node in lists of length 2N+1 causes 1 (one) extra comparisons in the bottom up algorithm’s “roll up” phase. The “+1” is the last node in the initial list, and it has the smallest data value. The roll up merge is one comparison to make the 1-long list’s single element the head and tail of the merged list, then appends the 2N long list of every other item.
Above, the difference in count of comparisons if the initial list is reverse sorted (largest to smallest data values) instead of having randomly chosen data values. The difference is negative, or zero for lists of 2N length.
The differences in count of comparisons is an order of magnitude smaller than for randomly chosen data values in the initial list. Any performance difference will be 1/10 that of randomly chosen data values lists. It’s not that there’s no performance drops or increases, it’s that they’re quite small for reverse sorted initial lists. This isn’t a consequence of having one node that doesn’t get merged in until a final “roll up” phase, it’s a consequence of using up just one list during the merging.