Are nested functions faster than global functions in Python?

Question

I prefer using nested functions instead of methods or global functions in Python whenever possible. So I decided to test their performance because it seams that when you define a function in another function there will be an overhead for the definition of the inner function in each call of the outer function.

At best I was hoping for the global function to be just slightly faster, but surprisingly the nested function was faster. Does anyone have any idea why?

This is my code:

from time import clock def a(n): return n + 1 def b1(loopcount): return sum([a(n) for n in range(loopcount)]) def b2(loopcount): def a(n): return n + 1 return sum([a(n) for n in range(loopcount)]) powers = [5, 6, 7] b1times = [] b2times = [] print " ", "".join(["{:^10d}".format(n) for n in powers]) for i in range(5): for power in powers: t = clock() b1(10**power) b1times.append(clock() - t) for power in powers: t = clock() b2(10**power) b2times.append(clock() - t) print "b1:", "".join(["{:^10.5f}".format(n) for n in b1times]) print "b2:", "".join(["{:^10.5f}".format(n) for n in b2times]) print "" b1times = [] b2times = []

And this is the result on my computer:

 5 6 7 b1: 0.08200 0.82773 8.47946 b2: 0.06914 0.79637 8.18571 b1: 0.07332 0.82139 8.68262 b2: 0.06547 0.82088 8.19606 b1: 0.07963 0.82625 9.65037 b2: 0.06617 0.82027 8.21412 b1: 0.07630 0.82112 8.49082 b2: 0.06541 0.80686 8.20532 b1: 0.12328 0.87034 8.42964 b2: 0.07059 0.79717 8.24620

UPDATE: using @Janne Karila's comment

Now that I'm calling b1 and b2 more, b1 becomes faster. So as @Kos and @Pavel Anossov said in their answers a few factors affect the speed here and you can't make a general statement.
Thanks everyone!

from time import * def a1(n): return n + 1 def b1(n): return a1(n) def b2(n): def a2(): return n + 1 return a2() powers = [4, 5, 6] b1times = [] b2times = [] print " ", "".join(["{:^10d}".format(n) for n in powers]) for i in range(5): for power in powers: t = clock() sum([b1(n) for n in range(10**power)]) b1times.append(clock() - t) for power in powers: t = clock() sum([b2(n) for n in range(10**power)]) b2times.append(clock() - t) print "b1:", "".join(["{:^10.5f}".format(n) for n in b1times]) print "b2:", "".join(["{:^10.5f}".format(n) for n in b2times]) print "" b1times = [] b2times = []

Rather than rolling your own timing stuff, you should use the timeit module. — Gareth Latty
– Gareth Latty, Commented Jan 2, 2013 at 12:39
also, i think you should call your inner function something that is different from the outer function. calling them both a might be misleading somewhere. — Inbar Rose
– Inbar Rose, Commented Jan 2, 2013 at 12:40
Each of your timings is a single call to the outer function, which makes many calls to the inner function. You are not timing the overhead in the outer function this way. — Janne Karila
– Janne Karila, Commented Jan 2, 2013 at 12:43

Kos · Accepted Answer · 2013-01-02 12:47:07Z

There are several parts that have an effect here:

1) The time to define the function (create the function object),
2) The time to look up the function object by name,
3) The time to actually call the function.

Your global function example is faster with 1) (no need to redefine a in each call to b1). However, it is slower with in 2) because global variable lookup is slower than local lookup.

Why can't we have both then?

I've extended your benchmark with a solution that uses the global function, but avoids the global lookup using a local variable. It seems to be the fastest of the three on my machine:

 5 6 7 b1: 0.04147 0.44421 4.46508 b2: 0.03399 0.43321 4.41121 b3: 0.03258 0.41821 4.25542 b1: 0.03240 0.42998 4.39774 b2: 0.03320 0.43465 4.42229 b3: 0.03155 0.42109 4.23669 b1: 0.03273 0.43321 4.37266 b2: 0.03326 0.43551 4.42208 b3: 0.03137 0.42356 4.25341 b1: 0.03253 0.43104 4.40466 b2: 0.03401 0.43719 4.42996 b3: 0.03155 0.41681 4.24132 b1: 0.03244 0.42965 4.37192 b2: 0.03310 0.43629 4.42727 b3: 0.03117 0.41701 4.23932

You don't need to use global a since you're not rebinding a (unless you did this to stress the fact that it's a global for readers).

Community · Accepted Answer · 2020-06-20 09:12:55Z

LOAD_GLOBAL is much slower than LOAD_FAST.

b1

 5 0 LOAD_GLOBAL 0 (sum) 3 BUILD_LIST 0 6 LOAD_GLOBAL 1 (range) 9 LOAD_FAST 0 (loopcount) 12 CALL_FUNCTION 1 15 GET_ITER >> 16 FOR_ITER 18 (to 37) 19 STORE_FAST 1 (n) 22 LOAD_GLOBAL 2 (a) 25 LOAD_FAST 1 (n) 28 CALL_FUNCTION 1 31 LIST_APPEND 2 34 JUMP_ABSOLUTE 16 >> 37 CALL_FUNCTION 1 40 RETURN_VALUE

b2

 8 0 LOAD_CONST 1 (<code object a at 01E57A40, ...>) 3 MAKE_FUNCTION 0 6 STORE_FAST 1 (a) 10 9 LOAD_GLOBAL 0 (sum) 12 BUILD_LIST 0 15 LOAD_GLOBAL 1 (range) 18 LOAD_FAST 0 (loopcount) 21 CALL_FUNCTION 1 24 GET_ITER >> 25 FOR_ITER 18 (to 46) 28 STORE_FAST 2 (n) 31 LOAD_FAST 1 (a) 34 LOAD_FAST 2 (n) 37 CALL_FUNCTION 1 40 LIST_APPEND 2 43 JUMP_ABSOLUTE 25 >> 46 CALL_FUNCTION 1 49 RETURN_VALUE

Whether the difference makes up for creating a function every time depends on the function (and how many times you use LOAD_GLOBAL).

Creating a local reference to a global function to use in an inner loop is a relatively common optimisation:

def a(n): return n + 1 def b1(loopcount): local_a = a return sum([local_a(n) for n in range(loopcount)])

This should still be faster than looking up a global and more maintainable than a nested function.

+1 for mentioning the common optimization. Another optimization in this case is to get rid of the function and use sum(range(1,loopcount+1)) ;-), but I suppose that's not really what OP is looking for

Gareth Latty · Accepted Answer · 2013-01-02 12:39:00Z

0

In general, no, as each time the outer function is run, the inner function must be redefined.

This said, performance really doesn't matter unless you can show it does (for example, a particular loop is too slow and causing performance issues) - so I'd recommend using what is most readable - premature optimization is a bad thing.

That said, I would argue global functions are probably a better solution as they are easier to reuse across your code and I would say more readable - after all, flat is better than nested.

answered Jan 2, 2013 at 12:39

Gareth Latty

89.5k18 gold badges183 silver badges187 bronze badges

4 Comments

nima Over a year ago

But if you define function1 inside function2 when function1 is only used in function2, you will have a cleaner global namespace.

Gareth Latty Over a year ago

@nima Yes, but if you have that many functions it matters, you should probably be splitting them up into modules - which is far clearer than making lots of deeply nested functions in functions.

nima Over a year ago

Then you'll have to import multiple modules when you want to use a simple feature. If you use nested functions you can import * a single module and only add a few important functions to your namespace. But that's only a personal preference I guess ;)

Gareth Latty Over a year ago

import * is almost always a bad idea, as it pollutes your namespace. Importing a few more modules isn't a bad thing if it gives you useful structure.

hbprotoss · Accepted Answer · 2013-01-02 12:57:39Z

I think it is caused by the order of searching object definition in python.

Whenever the interpreter meets an object name, it will first search the local object definition dict which records the mapping between object name and the object itself. If not found in local dict, then global, then builtin.

And, all things in python are objects, including function.

Collectives™ on Stack Overflow

Are nested functions faster than global functions in Python?

4 Answers 4

1 Comment

b1

b2

1 Comment

4 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

b1

b2

1 Comment

4 Comments

Comments

Related