__attribute__((noipa)) fully blocks all inter-procedural analysis
If you just want to look at the asm for a call site, the easiest thing is not to define the function in the same compilation unit, only declare it with a prototype. (That won't work if you're using -flto to allow cross-file inlining with link-time optimization, which you should for production builds, especially if you have small functions in separate .c files, not .h.)
__attribute__((noinline)) is sufficient for most use-cases, especially ones where you're actually planning to run the generated code and just want to stop the compiler from bloating your binary.
But sometimes you also want to look at the asm for a stand-alone definition of the function so you want a way to get both definition and call-site into one compile.
To fully block all inter-procedural analysis in GCC, use __attribute__((noipa)). (also or instead)
__attribute__((noinline)) void empty_noinline(){} int foo(){ empty_noinline(); return 1; }
GCC and Clang (with -O1 or higher) both compile foo without a call to empty_noinline since it does nothing. They aren't inlining it, they just see there are no side effects and optimize away the call itself.
# x86-64 GCC -O3 empty_noinline: ret foo: mov eax, 1 ret
But with noipa, GCC acts like the definition wasn't visible, only a declaration. (Clang doesn't have an equivalent attribute that I know of, but ignores attributes it doesn't understand. I use noinline as well as noipa so at least noinline is there for Clang, for functions where that helps at all.)
__attribute__((noinline,noipa)) void empty_noipa(){} int bar(){ empty_noipa(); return 2; }
empty_noipa: ret bar: sub rsp, 8 # align stack pointer by 16 before a call call empty_noipa #### Call not optimized away mov eax, 2 add rsp, 8 ret
See these examples on Godbolt with GCC15.2 and Clang 20.1 where you can play with them.
Inter-procedural optimization other than inlining or optimizing away
Putting some work into empty_noinline() will convince GCC and Clang to actually call it, like volatile int a=1; a*=2;.
But without noipa, GCC will omit aligning the stack to fully follow the ABI, since inter-procedural analysis finds that the function doesn't care about stack alignment by 16. GCC will also potentially make .constprop clones of functions, specialized for one arg being a compile-time constant. noipa blocks that, too.
__attribute__((noinline)) void empty_noinline(){ volatile // no side effects unless a is volatile int a=1; a*=2; } int foo(){ empty_noinline(); return 1; }
(See the above Godbolt link; just uncomment volatile in empty_noinline.)
# x86-64 GCC -O3 empty_noinline: mov DWORD PTR [rsp-4], 1 mov eax, DWORD PTR [rsp-4] # GCC won't use memory-source ALU ops with volatile add eax, eax mov DWORD PTR [rsp-4], eax ret foo: # note lack of sub/add RSP, or dummy push like Clang does # so RSP%16 == 8 at this point call empty_noinline mov eax, 1 ret
Clang 20 still fully follows the calling convention:
empty_noinline: mov dword ptr [rsp - 4], 1 shl dword ptr [rsp - 4] # volatile doesn't force clang to load/store with separate insns ret foo: push rax # align the stack with a dummy push call empty_noinline mov eax, 1 pop rcx # and undo it into a different call-clobbered reg ret
The above GCC optimizations aren't necessarily bad depending on what you wanted. If you want an example of how to write correct asm by hand which follows the calling conventions, then any IPA is bad. But if you want efficient compact code, then opportunistically skipping stack alignment is a good thing when it doesn't hurt the callee.