Why are the "normal" methods for generating random integers so slow?

Question

Not a maths major or a cs major, I just fool around with python (usually making scripts for simulations/theorycrafting on video games) and I discovered just how bad random.randint is performance wise. It's got me wondering why random.randint or random.randrange are used/made the way they are. I made a function that produces (for all intents and actual purposes) identical results to random.randint:

big_bleeping_float= (2**64 - 2)/(2**64 - 2) def fastrandint(start, stop): return start + int(random.random() * (stop - start + big_bleeping_float))

There is a massive 180% speed boost using that to generate an integer in the range (inclusive) 0-65 compared to random.randrange(0, 66), the next fastest method.

>>> timeit.timeit('random.randint(0, 66)', setup='from numpy import random', number=10000) 0.03165552873121058 >>> timeit.timeit('random.randint(0, 65)', setup='import random', number=10000) 0.022374771118336412 >>> timeit.timeit('random.randrange(0, 66)', setup='import random', number=10000) 0.01937231027605435 >>> timeit.timeit('fastrandint(0, 65)', setup='import random; from fasterthanrandomrandom import fastrandint', number=10000) 0.0067909916844523755

Furthermore, the adaptation of this function as an alternative to random.choice is 75% faster, and I'm sure adding larger-than-one stepped ranges would be faster (although I didn't test that). For almost double the speed boost as using the fastrandint function you can simply write it inline:

>>> timeit.timeit('int(random.random() * (65 + big_bleeping_float))', setup='import random; big_bleeping_float= (2**64 - 2)/(2**64 - 2)', number=10000) 0.0037642723021917845

So in summary, why am I wrong that my function is a better, why is it faster if it is better, and is there a yet even faster way to do what I'm doing?

Isn't (2 ** 64 - 2)/(2 ** 64 - 2) just 1? Why are you adding it? — cxrodgers
– cxrodgers, Commented May 31, 2016 at 4:36
The +1 masquerading as big_bleeping_float is actually required, since random()*N generates values from (0,N] and applying int to this rounds down, giving ints in (0,N-1) — Michael Anderson
– Michael Anderson, Commented May 31, 2016 at 8:05
Oh, I have no idea why it's not just written as +1, or why it's called big_bleeping_float - just that it is actually required for the values to be correct. — Michael Anderson
– Michael Anderson, Commented May 31, 2016 at 8:14

Jieter · Accepted Answer · 2016-05-31 08:27:25Z

randint calls randrange which does a bunch of range/type checks and conversions and then uses _randbelow to generate a random int. _randbelow again does some range checks and finally uses random.

So if you remove all the checks for edge cases and some function call overhead, it's no surprise your fastrandint is quicker.

Actually _randbelow doesn't (usually) call random, it calls getrandbits() and loops to remove (reduce) bias.

Community · Accepted Answer · 2017-05-23 12:07:59Z

random.randint() and others are calling into random.getrandbits() which may be less efficient that direct calls to random(), but for good reason.

It is actually more correct to use a randint that calls into random.getrandbits(), as it can be done in an unbiased manner.

You can see that using random.random to generate values in a range ends up being biased since there are only M floating point values between 0 and 1 (for M pretty large). Take an N that doesn't divide into M, then if we write M = k N + r for 0<r<N. At best, using random.random() * (N+1) we'll get r numbers coming out with probability (k+1)/M and N-r numbers coming out with probability k/M. (This is at best, using the pigeon hole principle - in practice I'd expect the bias to be even worse).

Note that this bias is only noticeable for

A large number of sampling
where N is a large fraction of M the number of floats in (0,1]

So it probably won't matter to you, unless you know you need unbiased values - such as for scientific computing etc.

In contrast, a value from randint(0,N) can be unbiased by using rejection sampling from repeated calls to random.getrandbits(). Of course managing this can introduce additional overhead.

Aside

If you end up using a custom random implementation then

From the python 3 docs

Almost all module functions depend on the basic function random(), which generates a random float uniformly in the semi-open range [0.0, 1.0).

This suggests that randint and others may be implemented using random.random. If this is the case I would expect them to be slower, incurring at least one addition function call overhead per call.

Looking at the code referenced in https://stackoverflow.com/a/37540577/221955 you can see that this will happen if the random implementation doesn't provide a getrandbits() function.

Julien · Accepted Answer · 2016-05-31 07:51:30Z

This is probably rarely a problem but randint(0,10**1000) works while fastrandint(0,10**1000) crashes. The slower time is probably the price you need to pay to have a function that works for all possible cases...

Collectives™ on Stack Overflow

Why are the "normal" methods for generating random integers so slow?

3 Answers 3

1 Comment

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Related