33

In this post, Guido van Rossum says that a function call may be expensive, but I do not understand why nor how expensive it can be.

How much delay does a simple function call add to your code, and why?

9
  • 1
    Note that 'expensive' can be relative. Commented Apr 6, 2014 at 11:05
  • 1
    He says in the post why; creating a stack frame is expensive. Commented Apr 6, 2014 at 11:06
  • 4
    @thefourtheye Not necessarily - static languages can often do most of the work at compile-time, and sometimes compile out function calls alltogether. Commented Apr 6, 2014 at 11:11
  • 1
    @thefourtheye: languages like C do very little validation at runtime; for example no checks are made that the arguments match the function's expected parameters (the compiler is expected to check this during compilation). Commented Apr 6, 2014 at 11:11
  • 3
    @thejohnbackes Here's a link to the original article. web.archive.org/web/20180919031245/https://plus.google.com/… Commented Mar 12, 2020 at 14:46

3 Answers 3

36

A function call requires that the current execution frame is suspended, and a new frame is created and pushed on the stack. This is relatively expensive, compared to many other operations.

You can measure the exact time required with the timeit module:

>>> import timeit >>> def f(): pass ... >>> timeit.timeit(f) 0.15175890922546387 

That's 1/6th of a second for a million calls to an empty function; you'd compare the time required with whatever you are thinking of putting in a function; the 0.15 second would need to taken into account, if performance is an issue.

Sign up to request clarification or add additional context in comments.

4 Comments

When you saying relative expensive compared to many other operations, which kind of operations are you thinking of? E.g. Creating a new variable in memory, assignments, statements, ...
That's way too broad; do your own timings for your specific problem instead.
It is not that I have an specific problem, it is that I just want to understand why and in which cases is preferable to avoid function calling. Some timings: timeit.timeit('"abc"+"def"') # 0.0361530780792 timeit.timeit('a=[]') # 0.0726218223572
"abc" + "def" is a constant lookup; the compiler will already have folded the concatenation.
30

Python has a "relatively high" function call overhead, it is the cost we pay for some of Python's most useful functionality.

Monkey Patching:

You have so much power to monkey-patch/override behavior in Python that the interpreter can't guarantee that given

 a, b = X(1), X(2) return a.fn() + b.fn() + a.fn() 

a.fn() and b.fn() are the same, or that a.fn() will be the same after b.fn() has been called.

In [1]: def f(a, b): ...: return a.fn() + b.fn() + c.fn() ...: In [2]: dis.dis(f) 1 0 LOAD_FAST 0 (a) 3 LOAD_ATTR 0 (fn) 6 CALL_FUNCTION 0 9 LOAD_FAST 1 (b) 12 LOAD_ATTR 0 (fn) 15 CALL_FUNCTION 0 18 BINARY_ADD 19 LOAD_GLOBAL 1 (c) 22 LOAD_ATTR 0 (fn) 25 CALL_FUNCTION 0 28 BINARY_ADD 29 RETURN_VALUE 

In the above, you can see that 'fn' is looked up at each location. The same applies to variables, but people seem to be more aware of that.

In [11]: def g(a): ...: return a.i + a.i + a.i ...: In [12]: dis.dis(g) 2 0 LOAD_FAST 0 (a) 3 LOAD_ATTR 0 (i) 6 LOAD_FAST 0 (a) 9 LOAD_ATTR 0 (i) 12 BINARY_ADD 13 LOAD_FAST 0 (a) 16 LOAD_ATTR 0 (i) 19 BINARY_ADD 20 RETURN_VALUE 

Worse, because modules can monkey patch/replace themselves/each other, if you were calling a global/module function, the global/module has to be looked up every time:

In [16]: def h(): ...: v = numpy.vector(numpy.vector.identity) ...: for i in range(100): ...: v = numpy.vector.add(v, numpy.vector.identity) ...: In [17]: dis.dis(h) 2 0 LOAD_GLOBAL 0 (numpy) 3 LOAD_ATTR 1 (vector) 6 LOAD_GLOBAL 0 (numpy) 9 LOAD_ATTR 1 (vector) 12 LOAD_ATTR 2 (identity) 15 CALL_FUNCTION 1 18 STORE_FAST 0 (v) 3 21 SETUP_LOOP 47 (to 71) 24 LOAD_GLOBAL 3 (range) 27 LOAD_CONST 1 (100) 30 CALL_FUNCTION 1 33 GET_ITER >> 34 FOR_ITER 33 (to 70) 37 STORE_FAST 1 (i) 4 40 LOAD_GLOBAL 0 (numpy) 43 LOAD_ATTR 1 (vector) 46 LOAD_ATTR 4 (add) 49 LOAD_FAST 0 (v) 52 LOAD_GLOBAL 0 (numpy) 55 LOAD_ATTR 1 (vector) 58 LOAD_ATTR 2 (identity) 61 CALL_FUNCTION 2 64 STORE_FAST 0 (v) 67 JUMP_ABSOLUTE 34 >> 70 POP_BLOCK >> 71 LOAD_CONST 0 (None) 74 RETURN_VALUE 

WORKAROUND

Consider capturing or importing any values you expect not to mutate:

def f1(files): for filename in files: if os.path.exists(filename): yield filename # vs def f2(files): from os.path import exists for filename in files: if exists(filename): yield filename # or def f3(files, exists=os.path.exists): for filename in files: if exists(filename): yield filename 

See also the "In the wild" section

It's not always possible to import, though; for example, you you can import sys.stdin but you can't import sys.stdin.readline, and numpy types can have similar problems:

In [15]: def h(): ...: from numpy import vector ...: add = vector.add ...: idy = vector.identity ...: v = vector(idy) ...: for i in range(100): ...: v = add(v, idy) ...: In [16]: dis.dis(h) 2 0 LOAD_CONST 1 (-1) 3 LOAD_CONST 2 (('vector',)) 6 IMPORT_NAME 0 (numpy) 9 IMPORT_FROM 1 (vector) 12 STORE_FAST 0 (vector) 15 POP_TOP 3 16 LOAD_FAST 0 (vector) 19 LOAD_ATTR 2 (add) 22 STORE_FAST 1 (add) 4 25 LOAD_FAST 0 (vector) 28 LOAD_ATTR 3 (identity) 31 STORE_FAST 2 (idy) 5 34 LOAD_FAST 0 (vector) 37 LOAD_FAST 2 (idy) 40 CALL_FUNCTION 1 43 STORE_FAST 3 (v) 6 46 SETUP_LOOP 35 (to 84) 49 LOAD_GLOBAL 4 (range) 52 LOAD_CONST 3 (100) 55 CALL_FUNCTION 1 58 GET_ITER >> 59 FOR_ITER 21 (to 83) 62 STORE_FAST 4 (i) 7 65 LOAD_FAST 1 (add) 68 LOAD_FAST 3 (v) 71 LOAD_FAST 2 (idy) 74 CALL_FUNCTION 2 77 STORE_FAST 3 (v) 80 JUMP_ABSOLUTE 59 >> 83 POP_BLOCK >> 84 LOAD_CONST 0 (None) 87 RETURN_VALUE 

CAVEAT EMPTOR:

  • The capture variables is not a zero-cost operation, and it increases the frame size,
  • Only use after identifying hot code paths,

Argument Passing

Python's argument passing mechanism looks trivial, but unlike most language it costs a lot. We're talking about separating arguments into args and kwargs:

f(1, 2, 3) f(1, 2, c=3) f(c=3) f(1, 2) # c is auto-injected 

There's a lot of work that goes on in the CALL_FUNCTION operation, including potentially one transition from the C layer to the Python layer and back.

In addition to this, the parameters often need to be looked up to be passed:

f(obj.x, obj.y, obj.z) 

Consider:

In [28]: def fn(obj): ...: f = some.module.function ...: for x in range(1000): ...: for y in range(1000): ...: f(x + obj.x, y + obj.y, obj.z) ...: In [29]: dis.dis(fn) 2 0 LOAD_GLOBAL 0 (some) 3 LOAD_ATTR 1 (module) 6 LOAD_ATTR 2 (function) 9 STORE_FAST 1 (f) 3 12 SETUP_LOOP 76 (to 91) 15 LOAD_GLOBAL 3 (range) 18 LOAD_CONST 1 (1000) 21 CALL_FUNCTION 1 24 GET_ITER >> 25 FOR_ITER 62 (to 90) 28 STORE_FAST 2 (x) 4 31 SETUP_LOOP 53 (to 87) 34 LOAD_GLOBAL 3 (range) 37 LOAD_CONST 1 (1000) 40 CALL_FUNCTION 1 43 GET_ITER >> 44 FOR_ITER 39 (to 86) 47 STORE_FAST 3 (y) 5 50 LOAD_FAST 1 (f) 53 LOAD_FAST 2 (x) 56 LOAD_FAST 0 (obj) 59 LOAD_ATTR 4 (x) 62 BINARY_ADD 63 LOAD_FAST 3 (y) 66 LOAD_FAST 0 (obj) 69 LOAD_ATTR 5 (y) 72 BINARY_ADD 73 LOAD_FAST 0 (obj) 76 LOAD_ATTR 6 (z) 79 CALL_FUNCTION 3 82 POP_TOP 83 JUMP_ABSOLUTE 44 >> 86 POP_BLOCK >> 87 JUMP_ABSOLUTE 25 >> 90 POP_BLOCK >> 91 LOAD_CONST 0 (None) 94 RETURN_VALUE 

Where "LOAD_GLOBAL" requires that the name be hashed and then the globals table be queried for that hash value. This is an O(log N) operation.

But think about this: for our two, simple, 0-1000 loops, we are doing that a million times...

Also note that we are _call_ing a function there a million times. Fortunately, it's a built-in function and builtins have a much less prohibitive overhead; but if this was really your perf hotspot, you might want to consider optimizing the overhead of range by doing something like:

x, y = 0, 0 for i in range(1000 * 1000): .... y += 1 if y > 1000: x, y = x + 1, 0 

You could do some hacking on capturing variables, but it's likely to have minimal perf impact on this code and simply make it less maintainable.

The core pythonic fix to this problem is to use generators or iterables:

for i in obj.values(): prepare(i) # vs prepare(obj.values()) 

and

for i in ("left", "right", "up", "down"): test_move(i) # vs test_move(("left", "right", "up", "down")) 

and

for x in range(-1000, 1000): for y in range(-1000, 1000): fn(x + obj.x, y + obj.y, obj.z) # vs def coordinates(obj): for x in range(obj.x - 1000, obj.x + 1000 + 1): for y in range(obj.y - 1000, obj.y + 1000 + 1): yield x, y, z fn(coordinates(obj)) 

In the wild

You'll see these pythopticisms in the wild in forms like this:

def some_fn(a, b, c, stdin=sys.stdin): ... 

This has several advantages:

  • impacts the help() for this function, (default input is stdin)
  • provides a hook for unit tests,
  • promotes sys.stdin to a local (LOAD_FAST vs LOAD_GLOBAL+LOAD_ATTR)

Most numpy calls either take or have a variant that takes a list, array, etc, and if you're not using those, you're probably missing out on 99% of numpy's benefits.

def distances(target, candidates): values = [] for candidate in candidates: values.append(numpy.linalg.norm(candidate - target)) return numpy.array(values) # vs def distances(target, candidates): return numpy.linalg.norm(candidates - target) 

(Note: this isn't necessarily the best way to get distances, esp if you are not going to forward the distance value elsewhere; e.g if you're doing range check, it is probably more efficient to use a more selective approach that avoids using the sqrt operation)

Optimizing for iterables doesn't just mean passing them, but also returning them

def f4(files, exists=os.path.exists): return (filename for filename in files if exists(filename)) ^- returns a generator expression 

1 Comment

Wow. That is a complete essay. It’s great when SO answers go this deep.
12

Any statement of the form "X is expensive" doesn't take into account that performance is always relative to whatever else is going on, and relative to however else the task can be done.

There are many questions on SO that express concern about things that might be, but typically are not, performance problems.

As to whether function calls are expensive, there's a general two-part answer.

  1. For functions that do very little and do not call further sub-functions, and that in a particular application are responsible for more than 10% of total wall-clock time, it is worthwhile trying to in-line them or otherwise reduce the cost of invocation.

  2. In applications containing complex data structures and/or tall abstraction hierarchies, function calls are expensive not because of the time they take, but because they tempt you to make more of them than strictly necessary. When this occurs over multiple levels of abstraction, the inefficiencies multiply together, producing a compounded slowdown that is not easily localized.

The way to produce efficient code is a posteriori, not a priori. First write the code so it is clean and maintainable, including function calls as you like. Then while it is running with a realistic workload, let it tell you what can be done to speed it up. Here's an example.

4 Comments

Python's "CALL_FUNCTION" operation is known to be over an order of magnitude more expensive than the equivalent "CALL" operation equivalent in most other languages. This is part because you must look up the function at every call, even in a loop; part because of Python's argument passing system; part because of decorators and part because of Python's monkey patching capabilities. That's why Python itself recommends you favor functions that take iterables over repeatedly calling a function.
@kfsone: Thanks. I didn't know that. Of course, if you're only doing it once or a small number of time, it won't matter. But if it's in your inner loop, it will make a huge difference. (Of course, my favorite technique, (modest cough) random pausing, will spot it.)
I was recently working on a project that used "esrapy" to parse files. It took 5 minutes to parse 1200 small files. I changed one function and one call site to x = call(iterable) instead of x = [call(i) for i in iterable] -> 2 minutes. A few name hoists (spacesRe = re.compile('\s+') -> spaces = re.compile('\s+').search, obj.info.array.add in a loop -> add = obj.info.array.add; for ...: add(...)) and it was down to 50s.
Downvoted because this answer mostly talks about when and how to optimize, while the question asked about why function calls are expensive in python (considering that in many languages function calls can be 0-cost, the question is understandable).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.