You want function inlining, and most (optimizing) compilers are doing that.
Notice that inlining requires the called function to be known (and is effective only if that called function is not too big), since conceptually it is substituting the call by the rewriting of the called functgion. So you generally cannot inline an unknown function (e.g. a function pointer -and that includes functions from dynamically linked shared libraries-, which is perhaps visible as a virtual method in some vtable; but some compilers might sometimes optimize thru devirtualization techniques). Of course it is not always possible to inline recursive functions (some clever compilers might use partial evaluation and in some cases be able to inline recursive functions).
Notice also the inlining, even when it is easily possible, is not always effective: you (actually your compiler) could increase so much the code size that CPU caches (or branch predictor) would work less efficiently, and that would make your program run slower.
I am a bit focusing on functional programming style, since you tagged your qestion as such.
Notice that you don't need to have any call stack (at least in the machine sense of the "call stack" expression). You could use only the heap.
So, take a look at continuations and read more about continuation passing style (CPS) and CPS transformation (intuitively, you could use continuation closures as reified "call frames" allocated in the heap, and they are sort-of mimicking a call stack; then you need an efficient garbage collector).
Andrew Appel wrote a book Compiling with Continuations and an old paper garbage collection can be faster than stack allocation. See also A.Kennedy's paper (ICFP2007) Compiling with Continuations, Continued
I also recommend reading Queinnec's Lisp In Small Pieces book, which has several chapters related to continuation & compilation.
Notice also that some languages (e.g. Brainfuck) or abstract machines (e.g. OISC, RAM) don't have any calling facilities but are still Turing-complete, so you don't (in theory) even need any function call mechanism, even if it is extremely convenient. BTW, some old instruction set architectures (e.g. IBM/370) don't even have a hardware call stack, or a pushing call machine instruction (the IBM/370 had only a Branch and Link machine instruction)
At last, if your entire program (including all the needed libraries) does not have any recursion you could store the return address (and the "local" variables, which are actually becoming static) of each function in static locations. Old Fortran77 compilers did that in the early 1980s (so the compiled programs did not use any call stack at that time).