Mergesort Investigation 7 - node size

I checked the effect of list element size on my implementation of mergesort of a linked list.

My linked list, which I piously believe is a typical linked list, comprises a series of Golang structures:

type Node struct {
    Data uint 
    Next *Node
}

A quick Go program reveals that struct to take up 16 bytes in memory, apparently 8 bytes for the uint type .Data field, and 8 bytes for the *Node .Next field.

Extrapolating from a superstitious belief that virtual memory, paging, page sizes and page tables, can make a difference, I ran a series of mergesort benchmarks with different Node structures:

type Node struct {
    Data uint 
    Next *Node
    Padding [16]byte
}

The .Padding element, at 16 bytes, changes the struct size to 32 bytes. I recompiled the mergesort program with 16, 112, 240, 368 and 496 byte Padding array element sizes to get 32, 128, 256, 384 and 512 byte struct Node sizes. You do have to use a Go array, not a slice. The storage backing a slice is not usually contiguous with the slice header, and so doesn’t directly contribute to the size of the Node.


I did all the node size benchmarking on a Macbook M2 ARM. I used the iterative mergesort algorithm, the algorithm with the abrupt performance drops at certain list lengths. I used the “idiomatic” in-memory list layout.

mergesort benchmark timings and list element size

Something really goes wrong at nodes 384 bytes and up. The abrupt drops in performance happen at different list lengths. Abrupt increases in performance happen at some list lengths.

mergesort benchmark timings and list element size

Above, a y-axis zoom-in on the first graph. The 512-byte node size causes long benchmark times, which compresses the graph vertically if all points are shown. Looks like the performance drops at 4, 8 and 16 million node list happen when nodes are 16 or 32 bytes. Performance drops happen at 4 and 8 million node list for 128 and 256 byte nodes, but list lengths about 4 million nodes cause garbage results for 256 byte nodes, and list lengths about 8 million nodes do the same for 128 byte nodes.

I’m calling this some kind of paging related issues complicating the iterative mergesort performance question. Just like address space layout (via .Next pointer) issues, this is interesting in some larger context, in that real linked lists might have nodes much larger than 256-bytes, or have very poor locality of reference when following .Next pointers. In the context of figuring out what’s wrong with my iterative mergesort, I think this is a distraction.