4

I understand the usage of push rbp...pop rbp at the start and end of a function to preserve the rbp value of the calling function, since the rbp register is callee-preserved. And then I understand the 'convention' of using rbp as the current top of the stack frame for the current procedure being executed. But related to this I have two questions:

  1. Is rbp just a convention? Could I just as easily use r11 (or any other register or even 8 bytes on the stack) as the base of the stack frame? Is there anything special about the rbp register, or it's just used as the stack frame based upon history and convention?
  2. Why is mov %rbp, %rsp used as a 'cleanup' method before leaving a function? For example, often the push/pop instructions will be symmetrical, so is the mov %rbp, %rsp just a shorthand way where someone can 'skip' doing the symmetrical pops/adds and such? What would be an actual usage of where mov %rbp, %rsp would be useful? Almost all the times I see it in compiler output (with zero optimizations turned on), it seems either unnecessary or redundant, and I'm having trouble thinking of a scenario where it might actually be useful.
2
  • 2
    1) You don't even need a frame pointer. You can address relative to rsp. You could not address relative to sp in 16 bit mode. 2) Actual usage is if you do dynamic stack allocation, e.g. via alloca, or, as you say, skip balancing the stack at each step. Commented Mar 28, 2021 at 21:20
  • 1
    Have you tried looking at compiler output to see what they actually do? godbolt.org. They don't use mov %rbp, %rsp or leave when it would be redundant (just pop %rbp), even with un-optimized code (or -fno-omit-frame-pointer in general.) Commented Mar 28, 2021 at 21:45

1 Answer 1

6

Optimized code doesn't use frame pointers at all, except for stuff like VLAs / alloca (variable-sized movement of RSP), or if you specifically use -fno-omit-frame-pointer (e.g. to make perf record stack sampling more efficient/reliable). Un-optimized code is usually not as interesting to look at. How to remove "noise" from GCC/clang assembly output?

So there are plenty of duplicates for the part about when / why to use a frame pointer at all. The interesting part is whether a register other than RBP could have been chosen.


The only things special about RBP are that leave can compactly do RSP=RBP + pop RBP; and that a (%rbp) addressing mode requires an explicit disp8 or disp32 (with value 0).

So if you are going to use a frame pointer at all, you should pick RBP because it's at least as good as any other reg at being a frame pointer, but worse than other regs for some other uses. You never need 0(frame_pointer), only other offsets. (R13 has the same always-needs-a-disp8=0 effect, but then every stack access would always need a REX prefix, like for add -12(%r13), %eax which doesn't with RBP.)

Also, all other "legacy" registers (that you can use without a REX, i.e. not R8-R15) have at least one implicit use in at least one instruction that compilers may actually generate, like cmpxchg16b, cpuid, shl %cl, %reg, rep movsb or whatever, so any other reg would be worse as a frame pointer. You can't do simple naive un-optimized (or toy-compiler) code-gen if you need to shuffle things around to free up RBX for some instruction that needs it for a different purpose. (Stack unwinding on exceptions may also rely on the frame pointer always being in a specific register, if your .cfi_* directives specified that.)

Consistency with previous x86 modes would have been sufficient reason to use RBP, to make it easier for puny human minds to remember, but there are still code-size and other reasons to pick RBP if you're going to use one. (In fact, since (%rsp) addressing modes always need a SIB byte, the instructions to set up a frame pointer can actually pay for themselves over a large function in terms of code size, although not in instructions / uops.)


Reasons that aren't still relevant:

An RBP base address implies the SS segment, like RSP, which was relevant in 16-bit mode, and theoretically in 32 (where non-flat memory models were possible), but not in 64-bit mode where it only affects the exception you get from a non-canonical address. So that part of the reason is basically gone, pretty much nobody cares about #GP vs. #SS there.

enter is too slow to be usable, but leave is still worth using if RSP isn't already pointing at the saved RBP, only costing 1 extra uop vs. manual mov %rbp, %rsp / pop %rbp on Intel CPUs, so that's what GCC does. You claim to have seen useless mov %rbp, %rsp instructions, but that's not what compilers actually do.

Note that mov %rbp, %rsp (3 bytes) is smaller than add $imm8, %rsp (4 bytes), so if you're using a frame pointer, you might as well restore RSP that way if it's not pointing at the saved RBP. (Unless you need to restore other registers if you saved them right below RBP instead of after a sub $imm, %rsp, although you can do the restoring with mov loads instead of pop.)

Sign up to request clarification or add additional context in comments.

5 Comments

what about recursion? Does recursion have any unique requirements/advantages of using a frame pointer?
@carl.hiass: No, it doesn't, it's just a function call to a function that also has to work when called by functions that aren't itself. If it's proper recursion, you don't have any variables that are shared between invocations in the chain of recursive calls, only interaction via function args and return value(s). It is asm so you're not limited to only one return-value register.
@carl.hiass: You could imagine some "private" recursive function that's only ever called from itself or a public wrapper function, and in that case many you have only the public function set up a frame pointer. So you can have some shared state in stack memory instead of in registers that have to get saved/restored and copied around every call. But you can still use recursion as a stack e.g. for traversing a tree. This is exactly equivalent to taking a pointer arg to a state struct, in fact is that with a custom calling convention where that arg register is call-preserved.
#SS is not the only consequence of implicitely referencing the stack, even in 64 bit mode. For instance, operand size (generally) defaults to 8 bytes, without need for REX.
@l.k: I'm talking about explicit reference to the stack, like mov eax, [rbp-4] or mov eax, [rsp+4]. That's the relevant issue with using a frame pointer or not for referencing local vars. You can sometimes use push to init local vars at the same time you reserve space for them, if you need the value in memory (although compilers fail to do that). And yes 64-bit is the default operand-size for push, but you generally wouldn't use push/pop for access to your own stack frame after the function prologue.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.