38

I am wondering about the %timeit command in IPython

From the docs:

%timeit [-n<N> -r<R> [-t|-c] -q -p<P> -o] setup_code 

Options:

-n: execute the given statement times in a loop. If this value is not given, a fitting value is chosen.

-r: repeat the loop iteration times and take the best result. Default: 3

For example, if I write:

%timeit -n 250 -r 2 [i+1 for i in range(5000)] 

So, -n 250 executes [i+1 for i in range(5000)] 250 times? Then what does -r 2?

4
  • It does two runs of 250. Commented Sep 5, 2017 at 0:29
  • 4
    Why run twice the 250 runs? I didn't understand logic behind why these options are provided. Commented Sep 5, 2017 at 0:33
  • What is unclear? Commented Sep 5, 2017 at 0:45
  • @bner341 After reading this a while (and MSeiferts link, which is very detailed), I think the most straight forward answer is that you need r for the the std dev. If r is 1, you only get the average run time (total time / n), and the std dev is 0. If r > 1, you still get the average run time (but now it is total time / (n*r)), but you also get the std dev of r1, r2, r3, r4, where r1 = run 1 average rune time = total time of run 1 / n; r2 is the same, etc Commented Jan 28, 2023 at 3:08

3 Answers 3

31

It specifies the number of repeats, the number of repeats are used to determine the average. For example:

%timeit -n 250 a = 2 # 61.9 ns ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 250 loops each) %timeit -n 250 -r 2 a = 2 # 62.6 ns ± 0 ns per loop (mean ± std. dev. of 2 runs, 250 loops each) 

The number of executions will be n * r but the statistic is based on the number of repeats (r) but the number of "loops" for each repeat is determined based on the number (n).

Basically you need a large enough n so the minimum of the number of loops is accurate "enough" to represent the fastest possible execution time, but you also need a large enough r to get accurate "statistics" on how trustworthy that "fastest possible execution time" measurement is (especially if you suspect that some caching could be happening).

For superficial timings you should always use an r of 3, 5 or 7 (in most cases that's large enough) and choose n as high as possible - but not too high, you probably want it to finish in a reasonable time :-)

Sign up to request clarification or add additional context in comments.

3 Comments

I come back to this answer every few months and I still have no idea what r is for, that's too vague.
@bwdm I answered a similar question in more detail here. Let me know if that's less vague. :)
But is the actual output of the magic the time it takes to do a loop or the time it takes to run the function? E.g. the output of line 1 is 61.9 microseconds +- SD, is 61.9 = (time for loop1 + time for loop2 + ... + time for loop7)/7 ?
10
timeit -n 250 <statement> 

The statement will get executed 3 * 250 = 750 times (-r has a default value of 3)

timeit -n 250 -r 4 <statement> 

The statement will get executed 4 * 250 = 1000 times

-r - how many times to repeat the timer (in the examples above, each time the timer is called with -n 250 which means 250 executions)

Comments

0

A more statistical way of explaining is as the bootstrapping estimation of the distribution of some statistics (specifically, its mean and standard deviation), in such context: "r" can be seen as the number of samples and "n" as the size of each sample.

1 Comment

Are you implying that taking a standard deviation of the samples (each consisting of n runs) would yield a more accurate estimate of the standard deviation than just taking a sample standard deviation over all (nr) of the runs? Otherwise I don't see why one would want to split the results into r samples, rather than just basing the inference on all nr runs. It seems to me that the real reason is the resolution of the timer itself, but let me know there's another, statistical reason.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.