Updated Self Relocating Program

I updated my self-relocating program for the 21st Century.

GCC and Clang C compilers compile the new code with no warnings. the program no longer needs weird hacks like compiling with an executable stack.

I did have to change how the copies of the “recursive” function did output. Only the original, un-relocated function produced printf output, although the return value of the copied, relocated code was correct, indicating that relocated code executed.

I changed the program so that it uses “anonymous” memory mapped “files” for memory allocation. Apparently you can no longer make memory allocated via malloc(3) executable. That meant passing a pointer to the mmap(2) libc function into the self-relocating function, instead of a pointer to malloc.

Running the program in the gdb debugger showed me that printf did get called from the relocated code, but the “format” string was zero-length. This resulted in no output. I’m not an x86_64 assembly language expert, so take the following under advisement.

Compiling with these arcane flags gets you code that sends the correct address to printf from copied code:

cc -static --no-pie -g -o xyz xyz.c

Compiling with --static --no-pie calls printf like this:

 1	mov    -0x30(%rbp),%rax
 2	mov    -0x48(%rbp),%rdx
 3	mov    %rax,%rsi
 4	mov    $0x48513b,%edi
 5	call   *%rdx

The way I read this, line 1 and 3 load register %rsi. Line 2 loads %rdx, and line 4 loads %edi. The $0x48513b constant in line 4 is the address of the printf format string, it’s first formal argument. The System V x86_64 calling convention that Linux compilers use says to put a function’s first argument in register %rdi, and the second argument in register %rsi. Looks to me like the compiler is using %edi because it only needs to load a 3-byte constant value, and %edi is the low 32-bits of %rdi.

A plain, default, compile of the same C code gives somewhat different instructions.

1  mov    -0x30(%rbp),%rax
X  lea    0xd23(%rip),%rcx
2  mov    -0x48(%rbp),%rdx
3  mov    %rax,%rsi
4  mov    %rcx,%rdi
5  call   *%rdx

I’ve numbered this sequence similarly to the first, but note that line X is an additional instruction. Lines X and 4 load %rsi in this snippet.

I interpret this sequence to use an address that’s 0xd23 bytes off from the contents of %rip as the first argument. %rip holds the address of the next instruction to execute, so at line X, the address of line 2. That’s how “position independent” code gets written. The compiler knows that a printf format string appears 0xd23 (3363) bytes away from the address of line X. It doesn’t matter to that kind of instruction where absolutely the format string appears.

When my code uses mmap(2) to get an allocation of executable code, and copies those instructions to that new allocation, the printf format string is not 0xd23 bytes away, therefore printf gets some weird random address which is fortuitously filled with zeros.

In both cases, printf gets called by call *%rdx at line 5, indirection through a pointer.

I’m guessing referencing static strings via code like lea 0xd23(%rip),%rcx is because C is now compiled as position independent code, which makes buffer overflows harder to exploit.

I didn’t want to rely on unusal, squirrely compiler flags, so I had the copied code invoke printf through another function that takes 2 integer valued arguments, pstring. That limits what the copied function can communicate, but that’s the breaks. I probably could have passed in pointers to strings via arguments, or set up global pointers to strings (type char **) that might have let me call printf without an intervening function, but passing a pointer to function pstring seemed like less work, and easier to get correct.

Github repo for this program


This is the output of running cpup6 4:

 1	signal_string  0x55d30f88e1e0
 2	signal_handler 0x55d30f88e240
 3	main           0x55d30f88e290
 4	copyup         0x55d30f88e520
 5	pstring        0x55d30f88e650
 6	copying code 4 times
 7	copied function is 960 bytes in size
 8	Enter function at 0x55d30f88e520
 9	mmap for new function at 0x7f70678d9000
10	Enter function at 0x7f70678d9000
11	mmap for new function at 0x7f70678d8000
12	Enter function at 0x7f70678d8000
13	mmap for new function at 0x7f70678d7000
14	Enter function at 0x7f70678d7000
15	mmap for new function at 0x7f70678d6000
16	Enter function at 0x7f70678d6000
17	Function at 0x7f70678d6000 reached max depth
18	Return from function at 0x7f70678d6000
19	Exit function at 0x7f70678d7000
20	Return from function at 0x7f70678d7000
21	Exit function at 0x7f70678d8000
22	Return from function at 0x7f70678d8000
23	Exit function at 0x7f70678d9000
24	Return from function at 0x7f70678d9000
25	Exit function at 0x55d30f88e520
26	function returns 4

Lines 9, 11, 13, 15 show the addresses where the mmap(2) system call allocated a new 4096-byte (0x1000) page of memory. They get allocated from high addresses to low addresses, which strikes me as odd.

The progam uses the difference between the address of main (0x55d30f88e290) on line 3, and the address of pstring (0x55d30f88e650) on line 5 as the size of the copyup function. That’s the number of bytes allocated and copied for the next invocation of the “relocated” function. That number is strictly too large, because it includes the size of the code for the main function. Originally I had a small function that just returned appearing after the end of the copyup function to get a more accurate size. That worked for GCC, but Clang apparently optimized that function away, or put it in the executable in a different place, because the difference between the two addresses was not correct. For the sake of portability, I used the difference between main and pstring addresses.

Below, all the memory (/proc/$PID/maps contents) from the same run of cpup6:

55d30f88d000-55d30f88e000 r--p 00000000 08:02 22809384                   /home/bediger/src/all_github_repos/bediger4000/nonrecursive-recursion/cpup6
55d30f88e000-55d30f88f000 r-xp 00001000 08:02 22809384                   /home/bediger/src/all_github_repos/bediger4000/nonrecursive-recursion/cpup6
55d30f88f000-55d30f890000 r--p 00002000 08:02 22809384                   /home/bediger/src/all_github_repos/bediger4000/nonrecursive-recursion/cpup6
55d30f890000-55d30f891000 r--p 00002000 08:02 22809384                   /home/bediger/src/all_github_repos/bediger4000/nonrecursive-recursion/cpup6
55d30f891000-55d30f892000 rw-p 00003000 08:02 22809384                   /home/bediger/src/all_github_repos/bediger4000/nonrecursive-recursion/cpup6
55d33128e000-55d3312af000 rw-p 00000000 00:00 0                          [heap]
7f7067600000-7f7067624000 r--p 00000000 08:02 27004378                   /usr/lib/libc.so.6
7f7067624000-7f7067795000 r-xp 00024000 08:02 27004378                   /usr/lib/libc.so.6
7f7067795000-7f7067804000 r--p 00195000 08:02 27004378                   /usr/lib/libc.so.6
7f7067804000-7f7067808000 r--p 00203000 08:02 27004378                   /usr/lib/libc.so.6
7f7067808000-7f706780a000 rw-p 00207000 08:02 27004378                   /usr/lib/libc.so.6
7f706780a000-7f7067812000 rw-p 00000000 00:00 0 
7f70678a5000-7f70678aa000 rw-p 00000000 00:00 0 
7f70678d6000-7f70678da000 rwxp 00000000 00:00 0 
7f70678da000-7f70678de000 r--p 00000000 00:00 0                          [vvar]
7f70678de000-7f70678e0000 r--p 00000000 00:00 0                          [vvar_vclock]
7f70678e0000-7f70678e2000 r-xp 00000000 00:00 0                          [vdso]
7f70678e2000-7f70678e3000 r--p 00000000 08:02 27004304                   /usr/lib/ld-linux-x86-64.so.2
7f70678e3000-7f706790d000 r-xp 00001000 08:02 27004304                   /usr/lib/ld-linux-x86-64.so.2
7f706790d000-7f706791b000 r--p 0002b000 08:02 27004304                   /usr/lib/ld-linux-x86-64.so.2
7f706791b000-7f706791d000 r--p 00039000 08:02 27004304                   /usr/lib/ld-linux-x86-64.so.2
7f706791d000-7f706791e000 rw-p 0003b000 08:02 27004304                   /usr/lib/ld-linux-x86-64.so.2
7f706791e000-7f706791f000 rw-p 00000000 00:00 0 
7fff7e2ff000-7fff7e320000 rw-p 00000000 00:00 0                          [stack]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0                  [vsyscall]

The addresses of functions main, copyup and pstring (0x55d30f88e290, 0x55d30f88e520, 0x55d30f88e650 respectively) all lie in the range of the second mapping, 55d30f88e000-55d30f88f000, which conveniently has “r-xp” protections, read, execute and private (not shared).

The copies of function copyup code are at addresses 0x7f70678d9000, 0x7f70678d8000, 0x7f70678d7000, 0x7f70678d6000. These addresses all fit in the 7f70678d6000-7f70678da000 mapping, which has “rwxp” permissions. cpup6.c invokes mmap(2) through a function pointer with PROT_EXEC|PROT_READ|PROT_WRITE permissions, and MAP_PRIVATE|MAP_ANONYMOUS.

I have invoked cpup6 50000, which should memory map 50,000 4096-byte pages, one per copy of copyup code, without crashing. 64-bit addressing, even though only the low 48 bits are used, is a phenomenally large space.