Mergesort Investigation 11 - memory access patterns
I finally looked at the addresses of linked list nodes that my recursive and wikipedia bottom-up) algorithms read and write.
I used pieces of my mergetest program code to make a new program The new program creates a linked list with randomly-chosen data values, and two pointer-enabled paths through the list:
type Node struct {
Data uint
Reset *Node // another chain through the linked list
Next *Node
}
The sorting gets done by changing which other Node
instance the .Next
element of a linked list node points to.
Initially, the .Reset
element has the same value as .Next
.
.Reset
is not changed by either sort algorithm.
Walking a sorted list by values of .Reset
,
and setting a node’s .Next
field to the value of .Reset
re-creates the original, unsorted list.
Here is the output of my new program, mergeaddresses
,
sorting with my recursive algorithm, resetting the list,
then sorting by the wikipedia bottom up algorithm:
1540 % ./mergeaddresses -b 4
# 2024-11-25T21:07:32-07:00 on hazard
# List of 4 nodes
# nodes 24 bytes in size
# recursive sort
# 3 -> 0 -> 3 -> 1 ->
merging <1,1> (0xc000010120, 0xc000010108)
merging <1,1> (0xc0000100f0, 0xc0000100d8)
merging <2,2> (0xc000010108, 0xc0000100d8)
# bottom up sort
# 3 -> 0 -> 3 -> 1 ->
merging <1,1> (0xc000010120, 0xc000010108)
merging <1,1> (0xc0000100f0, 0xc0000100d8)
merging <2,2> (0xc000010108, 0xc0000100d8)
# ending at 2024-11-25T21:07:32-07:00 on hazard
The text similar to (0xc000010120, 0xc000010108)
is a text representation
of the memory addresses of the nodes at the head of the lists to be merged.
The <1,1>
part indicates the sizes of the two lists to be merged.
Merging two, 1-length lists, gives one, 2-element long list.
That happens twice.
Merging two, 2-length lists yields a 4-length list,
which is what the code started with.
There are functions in the code that check whether a putative sorted linked list
has nodes with data values in increasing order,
and has the same length as it did in its unsorted state.
My recursive algorithm, and the wikipedia bottom up algorithm, read and write the same memory locations in the same order.
This means that something other than the memory accesses caused by
updating .Next
fields, and reading .Data
fields for comparison
cause the abrupt performance drops exhibited by the wikipedia bottom up algorithm.
The wikipedia bottom up algorithm experiences abrupt increase in mean time of 10 list sorts between 4001000 and 4401000 node lists, 8001000 and 8401000 node lists, and most obviously between 16401000 and 16801000 node long lists. My previous experiments have shown that the performance drops occur at list lengths of 2n+1 nodes. Conveniently, 222-1 = 4194305, 223-1 = 8388609, and 224-1 = 16777217.
There’s also a little weirdness at lists of length 1.3x107. This shows up in my previous bottom up benchmarking, and in attempting to eliminate effects of garbage collection It’s not nearly as large, but it exists.
The recursive and bottom up algorithms benchmark at close to the same speed, instead of the recursive algorithm coming in a little faster, as it did in my previous bottom up benchmarking, See the Qotom Bottom Up Mergesort Comparison chart. Between then and now, I changed the list merging code for clarity and uniformity, but cost myself a little performance.
2024-12-03: I Drew An Incorrect Conclusion
The bottom up algorithm keeps sorted lists of 2i length in an array
where array[i]
is a pointer to a list of length i
.
The end of the algorithm merges any and all lists pointed to by array
elements
into the result.
I did not look at non-power-of-2 size lists,
and I did not print out the addresses of the list heads left in array
.
The merge of any and all lists code looks like this:
1 result = nil
2 for i = 0; i < 32; i++ {
3 result = merge(array[i], result)
4 }
5
6 return result
Here’s what the algorithm has in array[i]
for various unsorted linked list lengths:
list length | array[0] | array[1] | array[2] | array[3] | array[4] | array[5] | array[6] | … |
---|---|---|---|---|---|---|---|---|
61 | non-nil | nil | non-nil | non-nil | non-nil | non-nil | nil | nil |
62 | nil | non-nil | non-nil | non-nil | non-nil | non-nil | nil | nil |
63 | non-nil | non-nil | non-nil | non-nil | non-nil | non-nil | nil | nil |
64 | nil | nil | nil | nil | nil | nil | non-nil | nil |
65 | non-nil | nil | nil | nil | nil | nil | non-nil | nil |
As an aside, if you treat non-nil as a 1, and nil as 0, the contents of the arrays are the original list length in binary:
6110 = 1111012
Because I looked at lists of 4, 8, 16 and 64 nodes, I only saw the same addresses of to-be-merged lists that the recursive algorithm merges. I didn’t see the final part of the algorithm, rolling up all the sublists of 2N lengths. I missed seeing some merges and some memory accesses.