All these results have been obtained with CPython 3.5.2.
I noticed strange performances for some operations of the set class.
I have measured the time needed to perform the union of two sets containing only integers. This time depends on the sizes of the sets, of course. Surprisingly, it also depends on the “density” of the integers. Here is a plot:
The x axis is the sum of the sizes of the two sets (which were chosen randomly and independently from each other, for each experience). The y axis is the time, in seconds (in log scale).
A density d means that the sets were instantiated by sampling N integers from a total of N/d integers. In other words, for a density of 0.5, we take one half of the integers of some interval, whereas for a density of 0.1 we take one tenth of the integers of some (larger) interval.
Here is a minimal code to get some results (if needed I can post the full code I used for the plot, but it is longer).
import time import random import numpy def get_values(size, density): return set(random.sample(range(int(size/density)), size)) def perform_op(size, density): values1 = get_values(size, density) values2 = get_values(size, density) t = time.time() result = values1 | values2 return time.time()-t size = 10000000 for density in [0.05, 0.1, 0.5, 0.99]: times = [perform_op(size, density) for _ in range(10)] print('density: %.2f, mean time: %.4f, standard deviation: %.4f' % (density, numpy.mean(times), numpy.std(times))) Union:
density: 0.05, time: 0.9846, standard deviation: 0.0440 density: 0.10, time: 1.0141, standard deviation: 0.0204 density: 0.50, time: 0.5477, standard deviation: 0.0059 density: 0.99, time: 0.3440, standard deviation: 0.0020 There is roughly a factor 3 for the computation time between the fastest and the slowest, with sets having a same size. Also, there is much more variability for low densities.
A funny thing is that, for the intersection (replace values1 | values2 by values1 & values2 in perform_op function), we also have non constant performances, but the pattern is different:
density: 0.05, time: 0.3928, standard deviation: 0.0046 density: 0.10, time: 0.4876, standard deviation: 0.0041 density: 0.50, time: 0.5975, standard deviation: 0.0127 density: 0.99, time: 0.3806, standard deviation: 0.0015 I did not test other set operations.
I do not understand why there are such differences. From what I know, Python sets are implemented with hash tables, so the density of the integers should not matter as long as their hashes are well spread.
What is the origin of these different performances?

setprovide a proper answer.