8

I prefer using nested functions instead of methods or global functions in Python whenever possible. So I decided to test their performance because it seams that when you define a function in another function there will be an overhead for the definition of the inner function in each call of the outer function.

At best I was hoping for the global function to be just slightly faster, but surprisingly the nested function was faster. Does anyone have any idea why?

This is my code:

from time import clock def a(n): return n + 1 def b1(loopcount): return sum([a(n) for n in range(loopcount)]) def b2(loopcount): def a(n): return n + 1 return sum([a(n) for n in range(loopcount)]) powers = [5, 6, 7] b1times = [] b2times = [] print " ", "".join(["{:^10d}".format(n) for n in powers]) for i in range(5): for power in powers: t = clock() b1(10**power) b1times.append(clock() - t) for power in powers: t = clock() b2(10**power) b2times.append(clock() - t) print "b1:", "".join(["{:^10.5f}".format(n) for n in b1times]) print "b2:", "".join(["{:^10.5f}".format(n) for n in b2times]) print "" b1times = [] b2times = [] 

And this is the result on my computer:

 5 6 7 b1: 0.08200 0.82773 8.47946 b2: 0.06914 0.79637 8.18571 b1: 0.07332 0.82139 8.68262 b2: 0.06547 0.82088 8.19606 b1: 0.07963 0.82625 9.65037 b2: 0.06617 0.82027 8.21412 b1: 0.07630 0.82112 8.49082 b2: 0.06541 0.80686 8.20532 b1: 0.12328 0.87034 8.42964 b2: 0.07059 0.79717 8.24620 

UPDATE: using @Janne Karila's comment

Now that I'm calling b1 and b2 more, b1 becomes faster. So as @Kos and @Pavel Anossov said in their answers a few factors affect the speed here and you can't make a general statement.
Thanks everyone!

from time import * def a1(n): return n + 1 def b1(n): return a1(n) def b2(n): def a2(): return n + 1 return a2() powers = [4, 5, 6] b1times = [] b2times = [] print " ", "".join(["{:^10d}".format(n) for n in powers]) for i in range(5): for power in powers: t = clock() sum([b1(n) for n in range(10**power)]) b1times.append(clock() - t) for power in powers: t = clock() sum([b2(n) for n in range(10**power)]) b2times.append(clock() - t) print "b1:", "".join(["{:^10.5f}".format(n) for n in b1times]) print "b2:", "".join(["{:^10.5f}".format(n) for n in b2times]) print "" b1times = [] b2times = [] 
4
  • 4
    Rather than rolling your own timing stuff, you should use the timeit module. Commented Jan 2, 2013 at 12:39
  • also, i think you should call your inner function something that is different from the outer function. calling them both a might be misleading somewhere. Commented Jan 2, 2013 at 12:40
  • Each of your timings is a single call to the outer function, which makes many calls to the inner function. You are not timing the overhead in the outer function this way. Commented Jan 2, 2013 at 12:43
  • @Janne Karila: Good point. I've updated. Commented Jan 2, 2013 at 13:36

4 Answers 4

12

There are several parts that have an effect here:

1) The time to define the function (create the function object),
2) The time to look up the function object by name,
3) The time to actually call the function.

Your global function example is faster with 1) (no need to redefine a in each call to b1). However, it is slower with in 2) because global variable lookup is slower than local lookup.

Why can't we have both then?

I've extended your benchmark with a solution that uses the global function, but avoids the global lookup using a local variable. It seems to be the fastest of the three on my machine:

 5 6 7 b1: 0.04147 0.44421 4.46508 b2: 0.03399 0.43321 4.41121 b3: 0.03258 0.41821 4.25542 b1: 0.03240 0.42998 4.39774 b2: 0.03320 0.43465 4.42229 b3: 0.03155 0.42109 4.23669 b1: 0.03273 0.43321 4.37266 b2: 0.03326 0.43551 4.42208 b3: 0.03137 0.42356 4.25341 b1: 0.03253 0.43104 4.40466 b2: 0.03401 0.43719 4.42996 b3: 0.03155 0.41681 4.24132 b1: 0.03244 0.42965 4.37192 b2: 0.03310 0.43629 4.42727 b3: 0.03117 0.41701 4.23932 
Sign up to request clarification or add additional context in comments.

1 Comment

You don't need to use global a since you're not rebinding a (unless you did this to stress the fact that it's a global for readers).
11

LOAD_GLOBAL is much slower than LOAD_FAST.

b1

 5 0 LOAD_GLOBAL 0 (sum) 3 BUILD_LIST 0 6 LOAD_GLOBAL 1 (range) 9 LOAD_FAST 0 (loopcount) 12 CALL_FUNCTION 1 15 GET_ITER >> 16 FOR_ITER 18 (to 37) 19 STORE_FAST 1 (n) 22 LOAD_GLOBAL 2 (a) 25 LOAD_FAST 1 (n) 28 CALL_FUNCTION 1 31 LIST_APPEND 2 34 JUMP_ABSOLUTE 16 >> 37 CALL_FUNCTION 1 40 RETURN_VALUE 

b2

 8 0 LOAD_CONST 1 (<code object a at 01E57A40, ...>) 3 MAKE_FUNCTION 0 6 STORE_FAST 1 (a) 10 9 LOAD_GLOBAL 0 (sum) 12 BUILD_LIST 0 15 LOAD_GLOBAL 1 (range) 18 LOAD_FAST 0 (loopcount) 21 CALL_FUNCTION 1 24 GET_ITER >> 25 FOR_ITER 18 (to 46) 28 STORE_FAST 2 (n) 31 LOAD_FAST 1 (a) 34 LOAD_FAST 2 (n) 37 CALL_FUNCTION 1 40 LIST_APPEND 2 43 JUMP_ABSOLUTE 25 >> 46 CALL_FUNCTION 1 49 RETURN_VALUE 

Whether the difference makes up for creating a function every time depends on the function (and how many times you use LOAD_GLOBAL).

 

Creating a local reference to a global function to use in an inner loop is a relatively common optimisation:

def a(n): return n + 1 def b1(loopcount): local_a = a return sum([local_a(n) for n in range(loopcount)]) 

This should still be faster than looking up a global and more maintainable than a nested function.

1 Comment

+1 for mentioning the common optimization. Another optimization in this case is to get rid of the function and use sum(range(1,loopcount+1)) ;-), but I suppose that's not really what OP is looking for
0

In general, no, as each time the outer function is run, the inner function must be redefined.

This said, performance really doesn't matter unless you can show it does (for example, a particular loop is too slow and causing performance issues) - so I'd recommend using what is most readable - premature optimization is a bad thing.

That said, I would argue global functions are probably a better solution as they are easier to reuse across your code and I would say more readable - after all, flat is better than nested.

4 Comments

But if you define function1 inside function2 when function1 is only used in function2, you will have a cleaner global namespace.
@nima Yes, but if you have that many functions it matters, you should probably be splitting them up into modules - which is far clearer than making lots of deeply nested functions in functions.
Then you'll have to import multiple modules when you want to use a simple feature. If you use nested functions you can import * a single module and only add a few important functions to your namespace. But that's only a personal preference I guess ;)
import * is almost always a bad idea, as it pollutes your namespace. Importing a few more modules isn't a bad thing if it gives you useful structure.
0

I think it is caused by the order of searching object definition in python.

Whenever the interpreter meets an object name, it will first search the local object definition dict which records the mapping between object name and the object itself. If not found in local dict, then global, then builtin.

And, all things in python are objects, including function.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.