Mergesort Investigation 11 - memory access patterns

I finally looked at the addresses of linked list nodes that my recursive and wikipedia bottom-up) algorithms read and write.

I used pieces of my mergetest program code to make a new program The new program creates a linked list with randomly-chosen data values, and two pointer-enabled paths through the list:

type Node struct {
    Data  uint
    Reset *Node // another chain through the linked list
    Next  *Node
}

The sorting gets done by changing which other Node instance the .Next element of a linked list node points to. Initially, the .Reset element has the same value as .Next. .Reset is not changed by either sort algorithm. Walking a sorted list by values of .Reset, and setting a node’s .Next field to the value of .Reset re-creates the original, unsorted list.

Here is the output of my new program, mergeaddresses, sorting with my recursive algorithm, resetting the list, then sorting by the wikipedia bottom up algorithm:

1540 % ./mergeaddresses -b 4
# 2024-11-25T21:07:32-07:00 on hazard
# List of 4 nodes
# nodes 24 bytes in size
# recursive sort
# 3 -> 0 -> 3 -> 1 -> 

merging <1,1> (0xc000010120, 0xc000010108)
merging <1,1> (0xc0000100f0, 0xc0000100d8)
merging <2,2> (0xc000010108, 0xc0000100d8)
# bottom up sort
# 3 -> 0 -> 3 -> 1 -> 

merging <1,1> (0xc000010120, 0xc000010108)
merging <1,1> (0xc0000100f0, 0xc0000100d8)
merging <2,2> (0xc000010108, 0xc0000100d8)
# ending at 2024-11-25T21:07:32-07:00 on hazard

The text similar to (0xc000010120, 0xc000010108) is a text representation of the memory addresses of the nodes at the head of the lists to be merged. The <1,1> part indicates the sizes of the two lists to be merged. Merging two, 1-length lists, gives one, 2-element long list. That happens twice. Merging two, 2-length lists yields a 4-length list, which is what the code started with. There are functions in the code that check whether a putative sorted linked list has nodes with data values in increasing order, and has the same length as it did in its unsorted state.

My recursive algorithm, and the wikipedia bottom up algorithm, read and write the same memory locations in the same order.

This means that something other than the memory accesses caused by updating .Next fields, and reading .Data fields for comparison cause the abrupt performance drops exhibited by the wikipedia bottom up algorithm.

two algorithm sorting comparison

The wikipedia bottom up algorithm experiences abrupt increase in mean time of 10 list sorts between 4001000 and 4401000 node lists, 8001000 and 8401000 node lists, and most obviously between 16401000 and 16801000 node long lists. My previous experiments have shown that the performance drops occur at list lengths of 2n+1 nodes. Conveniently, 222-1 = 4194305, 223-1 = 8388609, and 224-1 = 16777217.

There’s also a little weirdness at lists of length 1.3x107. This shows up in my previous bottom up benchmarking, and in attempting to eliminate effects of garbage collection It’s not nearly as large, but it exists.

The recursive and bottom up algorithms benchmark at close to the same speed, instead of the recursive algorithm coming in a little faster, as it did in my previous bottom up benchmarking, See the Qotom Bottom Up Mergesort Comparison chart. Between then and now, I changed the list merging code for clarity and uniformity, but cost myself a little performance.

2024-12-03: I Drew An Incorrect Conclusion

The bottom up algorithm keeps sorted lists of 2i length in an array where array[i] is a pointer to a list of length i. The end of the algorithm merges any and all lists pointed to by array elements into the result. I did not look at non-power-of-2 size lists, and I did not print out the addresses of the list heads left in array.

The merge of any and all lists code looks like this:

     1	    result = nil
     2	    for i = 0; i < 32; i++ {
     3	        result = merge(array[i], result)
     4	    }
     5	
     6	    return result

Here’s what the algorithm has in array[i] for various unsorted linked list lengths:

list length array[0] array[1] array[2] array[3] array[4] array[5] array[6]
61 non-nil nil non-nil non-nil non-nil non-nil nil nil
62 nil non-nil non-nil non-nil non-nil non-nil nil nil
63 non-nil non-nil non-nil non-nil non-nil non-nil nil nil
64 nil nil nil nil nil nil non-nil nil
65 non-nil nil nil nil nil nil non-nil nil

As an aside, if you treat non-nil as a 1, and nil as 0, the contents of the arrays are the original list length in binary:

6110 = 1111012

Because I looked at lists of 4, 8, 16 and 64 nodes, I only saw the same addresses of to-be-merged lists that the recursive algorithm merges. I didn’t see the final part of the algorithm, rolling up all the sublists of 2N lengths. I missed seeing some merges and some memory accesses.