Memory allocation and addressing in Assembly

Question

I am trying to learn assembly and there a couple of instructions whose purpose I do not fully understand.

C code

#include <stdio.h> int main(int argc, char* argv[]) { printf("Argument One - %s\n", argv[1]); return 0; }

Assembly

 .section __TEXT,__text,regular,pure_instructions .build_version macos, 10, 14 .intel_syntax noprefix .globl _main ## -- Begin function main .p2align 4, 0x90 _main: ## @main ## %bb.0: push rbp mov rbp, rsp sub rsp, 32 lea rax, [rip + L_.str] mov dword ptr [rbp - 4], 0 mov dword ptr [rbp - 8], edi mov qword ptr [rbp - 16], rsi mov rsi, qword ptr [rbp - 16] mov rsi, qword ptr [rsi + 8] mov rdi, rax mov al, 0 call _printf xor ecx, ecx mov dword ptr [rbp - 20], eax ## 4-byte Spill mov eax, ecx add rsp, 32 pop rbp ret ## -- End function .section __TEXT,__cstring,cstring_literals L_.str: ## @.str .asciz "Argument One - %s\n" .subsections_via_symbols

Q1. sub rsp, 32

Why is space allocated for 32 bytes when there are no local variables? I believe argc and argv are saved in the registers edi and rsi respectively. If its so that they can be moved onto the stack, wouldn't that require only 12 bytes?

Q2. lea rax, [rip + L_.str] and mov rdi, rax

Am I correct in understanding that L_.str has the address of the string ""Argument One - %s\n"? From what I've understood, printf gets access to this string through the register rdi. So, why doesn't the instruction mov rdi, L_.str work instead?

Q3. mov dword ptr [rbp - 4], 0

Why is zero being pushed onto the stack?

Q4. mov dword ptr [rbp - 8], edi and mov qword ptr [rbp - 16], rsi

I believe these instruction are to get argc and argv onto the stack. Is it pure convention to use edi and rsi?

Q5. mov dword ptr [rbp - 20], eax

I haven't a clue what this does.

Most of that is noise and overhead from unoptimized code, e.g. copying args from registers to the stack for no reason, and (Q5) spilling the unused printf return value to stack space. Compile with -O3 or -O2 to get just the interesting part. How to remove "noise" from GCC/clang assembly output? — Peter Cordes
– Peter Cordes, Commented Jan 17, 2019 at 6:03
And yes, there is a standard that specifies how args are passed to functions, so compilers can make code that can call code from other compilers. In your case it's the x86-64 System V ABI. See the function-calling part of What are the calling conventions for UNIX & Linux system calls on i386 and x86-64, and What registers are preserved through a linux x86-64 function call. See also stackoverflow.com/tags/x86/info for more links to docs. — Peter Cordes
– Peter Cordes, Commented Jan 17, 2019 at 6:17
You are compiling without optimisations. This causes the compiler to generate a lot of useless instructions. Pass at least -O1, better -O2 so the compiler generates reasonable code. — fuz
– fuz, Commented Jan 17, 2019 at 11:03
@fuz Why would a compiler ever generate useless instructions to begin with? I really don't understand that. Just to adhere to calling conventions? — puppydrum64
– puppydrum64, Commented Jul 1, 2022 at 18:06
@puppydrum64 How is the compiler supposed to know if an instruction is useless to begin with, if you tell it not to check if instructions are useless? These seemingly useless instructions may be needed in some cases and if you tell the compiler not to check if they really are, it'll just generate them anyway. — fuz
– fuz, Commented Jul 1, 2022 at 18:45

Peter Cordes · Accepted Answer · 2019-01-17 06:43:03Z

Q1. sub rsp, 32

This is allocating space that is used to store some data. Although it allocates 32 bytes, the code is only using the first 16 bytes of that allocated space, a qword at [rbp-8] (0:edi) and a qword at [rbp-16] (rdi).

Q2. lea rax, [rip + L_.str] and mov rdi, rax

The lea is getting the address of a string stored in the "code" segement. It's moved to rdi which is used as one of the parameters for printf.

Q3. mov dword ptr [rbp - 4], 0 ... mov dword ptr [rbp - 8], edi

This stores a 64-bit little endian value composed of 0:edi at [rbp - 8]. I'm not sure why it's doing this, since it never loads from that qword later on.

It's normal for un-optimized code to store their register arguments to memory, where debug info can tell debuggers where to look for and modify them, but it's unclear why clang zero-extends argc in edi to 64 bits.

More likely that 0 dword is something separate, because it if the compiler really wanted to store a zero-extend argc, compilers will zero-extend in registers with a 32-bit mov, like mov ecx, edi ; mov [rbp-8], rcx. Possibly this extra zero is a return-value temporary which it later decides not to use because of an explicit return 0; instead of the implicit one from falling off the end of main? (main is special, and I think clang does create an internal temporary variable for the return value.)

Q4 mov qword ptr [rbp - 16], rsi ... mov rsi, qword ptr [rbp - 16]

Optimization off? It stores rsi then loads rsi from [rbp - 16]. rsi holds your argv function arg ( == &argv[0]). The x86-64 System V ABI passes integer/pointer args in RDI, RSI, RDX, RCX, R8, R9, then on the stack.

... mov rsi, qword ptr [rsi + 8]

This is loading rsi with the contents of argv[1], as the 2nd arg for printf. (For the same reason that main's 2nd arg was in rsi).

The x86-64 System V calling convention is also the reason for zeroing AL before calling a varargs function with no FP args.

Q5. mov dword ptr [rbp - 20], eax

Optimization off? It's storing the return value from printf, but never using it.

This is MacOS, not Windows x86-64 ABI. No shadow space in the 64-bit ABI for Linux or BSD.
I should point out that i assumed MacOS given this line in their output .build_version macos, 10, 14
Yes, optimization has been turned off. Also, why not use mov rdi, L_.str instead to move the address of the string into rdi?
@DKar : because lea rax, [rip + L_.str] makes the code position independent.
@MichaelPetch I'm sorry but I am new to ASM. Could to elaborate what you mean by position independent?

Collectives™ on Stack Overflow

Memory allocation and addressing in Assembly

1 Answer 1

9 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Linked

Related