transforming random.random uniform to exponential distribution doesn't produce correct result

Question

I am trying to generate a synthetic earthquake database where the number of events ($N$) with magnitude ($M$) in the range $[M, M+\delta_M]$ follows:
$\log_{10}(N) = a - bM$
where $a$ and $b$ are constants.

I am trying to do this in Python using the random module. I know I can (or at least I think I can - as I haven't tried it) use random.expovariate but I thought I could use random.random with a transformation like:

-math.log10(random.random()))

I ran this for 2,000,000 samples which I then binned into 0.1 bins and plotted on a log scale.

The red line shows the theoretical distribution used to generate the synthetic samples.

I'm not worried about the variation above x=4.5. This is due to small number of points and natural randomness. What I am asking about is the very small (at this scale) variation for the points near x=0. I plotted the variation of the synthetic points from the theoretical (blue dots):

As x decreases the number of events increase exponentially so the variation from the theoretical should decrease - not increase. And the point at x=0 is the opposite sense.

To try and work out where my problem lies I wrote code that generated numbers from 0 to 1 with a very fine step. Each number then went through the function noted above. The result (the blue dots in the above figure) is purely linear that exactly matches the theoretical values. This indicates that my transformation function and code is fine.

So the only difference between the twp sets of points in the above figure is that the blue ones are generated by 2,000,000 calls to the random function (results are then transformed into magnitudes and binned), while for the red ones I've taken 2,000,000 uniform steps between 0 and 1 (results are then transformed into magnitudes and binned using the same code).

So I'm thinking it's somehow something to do with the random number generator?

Would be grateful for any pointers. Thanks.

[added] Changed the call from random.random to random.uniform(0,1) as suggested by @Arty and the errors are now symmetrically distributed and of the expected magnitude. Have added +- 1 standard deviation to the plot.

Clearly random() and uniform(0,1) are doing something slightly differently.

I cut down my code and calculated synthetic data using random.random, random.uniform(0,1), np.random.random and np.random.uniform(0, 1) for 2,000,000 points.

Binned the results and plotted the difference between the observed and expected numbers (below).

Also added in the +-1 standard deviation limits. The numbers are all symmetrically distributed and of the correct magnitude indicating that ALL the random generators are working fine.

My conclusion is that somewhere along the line of changing/refining code I introduced a problem which has now been lost. I would dearly like to find that error so I don't make it again!

I am surprised that my original, incorrect code could perform correctly to the extent that it generated a real looking synthetic with only minor anomalies that were difficult to detect.

Thanks for everyone's help and apologies to those I disagreed with that said the problem did not lie with the random number generators!

Can you provide a minimal reproducible example that shows how you performed the generation, accumulation, and binning? I figure the two most probable causes would be denormal numbers (providing a greater range of possible values near 0; I haven't thought it through enough to be sure that's a real possibility) or a bug in the binning, but it would help to see the code. — ShadowRanger
– ShadowRanger, Commented Nov 4, 2020 at 7:53
Can you possible also test randomness of np.random.uniform(0, 1) with your code? It uses different algorithm and code for generation. Interesting if it will give same results as random.random(). Also nice if you provide whole code that did this graphs and measurements. — Arty
– Arty, Commented Nov 4, 2020 at 7:54
@RustyC Okay. But please be aware that this is very, very likely a statistical problem – including the features of the various random distributions. Even by taking your comparison as the uniform sample you already introduce a bias, which might not be what you actually want. Your binning is biased towards uniform, but the data is biased towards small values (that is, data in the bins is not centered and not uniform). Stack Overflow is appropriate to figure out why random.random deviates from the uniform distribution, but not to warn you if a uniform distribution is not suitable for the problem. — MisterMiyagi
– MisterMiyagi, Commented Nov 4, 2020 at 9:41
I don't see how random.uniform(0,1) can affect things, it'll just evaluate 0 + 1 * random.random() so should just be a bit slower — Sam Mason
– Sam Mason, Commented Nov 4, 2020 at 10:01
@RustyC you say "Clearly random() and uniform(0,1) are doing something slightly differently." but I've linked to the CPython code and we can see it doesn't do anything differently — Sam Mason
– Sam Mason, Commented Nov 4, 2020 at 10:24

Daniel McLaury · Accepted Answer · 2020-11-04 09:02:52Z

Initially I thought you might be having some numerical analysis problem. Trying a million samples in python, however, I get the following observed results:

>>> T = int(1e6) >>> xs = [ -math.log10(random.random()) for i in range(T)] >>> len([x for x in xs if 0 <= x < 0.1]) 205614 >>> len([x for x in xs if 0.1 <= x < 0.2]) 163736 >>> len([x for x in xs if 0.2 <= x < 0.3]) 129627 >>> len([x for x in xs if 0.3 <= x < 0.4]) 103413 >>> len([x for x in xs if 0.4 <= x < 0.5]) 81734

If X = -log_10(x) with x uniformly distributed on [0, 1), then we should have

P(M <= X < M + d) = P(-M-d < log_10(x) <= -M) = 10^(-M) - 10^(-M-d)

and the numbers above are basically perfectly in line with these probabilities, e.g.

1 - 10^(-0.1) = 0.205672

which matches up nicely with our observed 205614 out of a million trials above.

Do you get different results than I do for the python code above?

So what is your consolusion? Is random.random() OK or not? Also can you do same measurements for np.random.uniform(0., 1.) if not to difficult? These two functions should have different implementations of random generators.
@Arty: I believe that random.random is working as documented on my machine, and that moreover the documented behavior makes it a perfectly good approximation of true randomness for OP's purposes. Since the OP hasn't specified in detail what the graphs in the question depict, I can't say any more about that. As for the numpy random generators, I imagine that they're not worse than the builtin python ones.
Incidentally, if one "Changed the call from random.random to random.uniform(0,1)" the result basically does not change at all – not surprising seeing how random.uniform just scales the result of random.random.
@Daniel McLaury - Ran you code and I get: 205777 163508 129735 103058 82295 Essentially same as you - within expected random variation.

Collectives™ on Stack Overflow

transforming random.random uniform to exponential distribution doesn't produce correct result

1 Answer 1

5 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Related