3

I am trying to generate a synthetic earthquake database where the number of events ($N$) with magnitude ($M$) in the range $[M, M+\delta_M]$ follows:
$\log_{10}(N) = a - bM$
where $a$ and $b$ are constants.

I am trying to do this in Python using the random module. I know I can (or at least I think I can - as I haven't tried it) use random.expovariate but I thought I could use random.random with a transformation like:

-math.log10(random.random())) 

I ran this for 2,000,000 samples which I then binned into 0.1 bins and plotted on a log scale.

enter image description here

The red line shows the theoretical distribution used to generate the synthetic samples.

I'm not worried about the variation above x=4.5. This is due to small number of points and natural randomness. What I am asking about is the very small (at this scale) variation for the points near x=0. I plotted the variation of the synthetic points from the theoretical (blue dots):

diff between observed and theory

As x decreases the number of events increase exponentially so the variation from the theoretical should decrease - not increase. And the point at x=0 is the opposite sense.

To try and work out where my problem lies I wrote code that generated numbers from 0 to 1 with a very fine step. Each number then went through the function noted above. The result (the blue dots in the above figure) is purely linear that exactly matches the theoretical values. This indicates that my transformation function and code is fine.

So the only difference between the twp sets of points in the above figure is that the blue ones are generated by 2,000,000 calls to the random function (results are then transformed into magnitudes and binned), while for the red ones I've taken 2,000,000 uniform steps between 0 and 1 (results are then transformed into magnitudes and binned using the same code).

So I'm thinking it's somehow something to do with the random number generator?

Would be grateful for any pointers. Thanks.

[added] Changed the call from random.random to random.uniform(0,1) as suggested by @Arty and the errors are now symmetrically distributed and of the expected magnitude. Have added +- 1 standard deviation to the plot.

deviation from expected +- 1 standard deviation

Clearly random() and uniform(0,1) are doing something slightly differently.


I cut down my code and calculated synthetic data using random.random, random.uniform(0,1), np.random.random and np.random.uniform(0, 1) for 2,000,000 points.

Binned the results and plotted the difference between the observed and expected numbers (below).

enter image description here

Also added in the +-1 standard deviation limits. The numbers are all symmetrically distributed and of the correct magnitude indicating that ALL the random generators are working fine.

My conclusion is that somewhere along the line of changing/refining code I introduced a problem which has now been lost. I would dearly like to find that error so I don't make it again!

I am surprised that my original, incorrect code could perform correctly to the extent that it generated a real looking synthetic with only minor anomalies that were difficult to detect.

Thanks for everyone's help and apologies to those I disagreed with that said the problem did not lie with the random number generators!

16
  • 4
    Can you provide a minimal reproducible example that shows how you performed the generation, accumulation, and binning? I figure the two most probable causes would be denormal numbers (providing a greater range of possible values near 0; I haven't thought it through enough to be sure that's a real possibility) or a bug in the binning, but it would help to see the code. Commented Nov 4, 2020 at 7:53
  • 1
    Can you possible also test randomness of np.random.uniform(0, 1) with your code? It uses different algorithm and code for generation. Interesting if it will give same results as random.random(). Also nice if you provide whole code that did this graphs and measurements. Commented Nov 4, 2020 at 7:54
  • 1
    @RustyC Okay. But please be aware that this is very, very likely a statistical problem – including the features of the various random distributions. Even by taking your comparison as the uniform sample you already introduce a bias, which might not be what you actually want. Your binning is biased towards uniform, but the data is biased towards small values (that is, data in the bins is not centered and not uniform). Stack Overflow is appropriate to figure out why random.random deviates from the uniform distribution, but not to warn you if a uniform distribution is not suitable for the problem. Commented Nov 4, 2020 at 9:41
  • 2
    I don't see how random.uniform(0,1) can affect things, it'll just evaluate 0 + 1 * random.random() so should just be a bit slower Commented Nov 4, 2020 at 10:01
  • 2
    @RustyC you say "Clearly random() and uniform(0,1) are doing something slightly differently." but I've linked to the CPython code and we can see it doesn't do anything differently Commented Nov 4, 2020 at 10:24

1 Answer 1

2

Initially I thought you might be having some numerical analysis problem. Trying a million samples in python, however, I get the following observed results:

>>> T = int(1e6) >>> xs = [ -math.log10(random.random()) for i in range(T)] >>> len([x for x in xs if 0 <= x < 0.1]) 205614 >>> len([x for x in xs if 0.1 <= x < 0.2]) 163736 >>> len([x for x in xs if 0.2 <= x < 0.3]) 129627 >>> len([x for x in xs if 0.3 <= x < 0.4]) 103413 >>> len([x for x in xs if 0.4 <= x < 0.5]) 81734 

If X = -log_10(x) with x uniformly distributed on [0, 1), then we should have

P(M <= X < M + d) = P(-M-d < log_10(x) <= -M) = 10^(-M) - 10^(-M-d) 

and the numbers above are basically perfectly in line with these probabilities, e.g.

1 - 10^(-0.1) = 0.205672 

which matches up nicely with our observed 205614 out of a million trials above.

Do you get different results than I do for the python code above?

Sign up to request clarification or add additional context in comments.

5 Comments

So what is your consolusion? Is random.random() OK or not? Also can you do same measurements for np.random.uniform(0., 1.) if not to difficult? These two functions should have different implementations of random generators.
@Arty: I believe that random.random is working as documented on my machine, and that moreover the documented behavior makes it a perfectly good approximation of true randomness for OP's purposes. Since the OP hasn't specified in detail what the graphs in the question depict, I can't say any more about that. As for the numpy random generators, I imagine that they're not worse than the builtin python ones.
Incidentally, if one "Changed the call from random.random to random.uniform(0,1)" the result basically does not change at all – not surprising seeing how random.uniform just scales the result of random.random.
@MisterMiyagi - But it has changed the results!
@Daniel McLaury - Ran you code and I get: 205777 163508 129735 103058 82295 Essentially same as you - within expected random variation.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.