49

How to generate a random integer as with np.random.randint(), but with a normal distribution around 0.

np.random.randint(-10, 10) returns integers with a discrete uniform distribution np.random.normal(0, 0.1, 1) returns floats with a normal distribution

What I want is a kind of combination between the two functions.

1
  • 19
    The normal distribution is continuous by definition, so the answer to this question depends on how you want to discretise it. One possible solution is to sample from np.random.normal and round the result to an integer. Commented May 24, 2016 at 11:07

7 Answers 7

44

One other way to get a discrete distribution that looks like the normal distribution is to draw from a multinomial distribution where the probabilities are calculated from a normal distribution.

import scipy.stats as ss import numpy as np import matplotlib.pyplot as plt x = np.arange(-10, 11) xU, xL = x + 0.5, x - 0.5 prob = ss.norm.cdf(xU, scale = 3) - ss.norm.cdf(xL, scale = 3) prob = prob / prob.sum() # normalize the probabilities so their sum is 1 nums = np.random.choice(x, size = 10000, p = prob) plt.hist(nums, bins = len(x)) 

Here, np.random.choice picks an integer from [-10, 10]. The probability for selecting an element, say 0, is calculated by p(-0.5 < x < 0.5) where x is a normal random variable with mean zero and standard deviation 3. I chose a std. dev. of 3 because this way p(-10 < x < 10) is almost 1.

The result looks like this:

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

Why did you add and subtract 0.5 in xL and xU?
For a continuous distribution, P(x=0)=0 (this is true for any other number). The probabiltiy is defined for intervals. Here, in order to associate a probability to 0 (and to other integers) I used the interval (-0.5, 0.5) It was basically because the question asked for integers. For 1, it is (0.5, 1.5).
Why didn't you take ss.norm.pdf(x, scale=3)?
It would work fine with these parameters but with a smaller standard deviation you would end up pdf's greater than 1, for example. It would be fine as long as you do the same normalization (dividing by the sum) afterwards but I didn't want to confuse potential readers since pdf is actually not probability (that's why it can be greater than 1) so I wanted to use actual probabilities.
16

It may be possible to generate a similar distribution from a Truncated Normal Distribution that is rounded up to integers. Here's an example with scipy's truncnorm().

import numpy as np from scipy.stats import truncnorm import matplotlib.pyplot as plt scale = 3. range = 10 size = 100000 X = truncnorm(a=-range/scale, b=+range/scale, scale=scale).rvs(size=size) X = X.round().astype(int) 

Let's see what it looks like

bins = 2 * range + 1 plt.hist(X, bins) 

enter image description here

5 Comments

I appreciate the responses from both @ayhan and bakkal. Please, I only ask this for my knowledge; I do not wish to insult either answer. Just looking at the plot, Bakkal's is more symmetric. They both seem sufficient and from the code seem equally valid. But my understanding is weak. Is there a objectively better method?
@RobertLugg the relatively higher symmetry may be due to sample size being larger. That said, I think the code in this answer is simpler.
Be aware that you overwrite the python range function with this code. Try to use neutral variable names.
While the code is simpler, it is also slower. In my test, about 100 times slower than ayhan's solution. Still, it takes 3 seconds to generate 10.000 numbers 50 times. So in many cases this will be ok.
This answer is slightly wrong, as the first and last buckets will have lower probabilities than other buckets. This is because X.round() will do natural rounding with 0.5 as the cutoff-point, while all generated real values are in [a,b). To fix this, use np.floor(X).astype(int) instead, which will yield integer values in [a,b) with equal probabilities.
11

The accepted answer here works, but I tried Will Vousden's solution and it works well too:

import numpy as np # Generate Distribution: randomNums = np.random.normal(scale=3, size=100000) # Round to the nearest integer and convert to integer (NB the latter is not done automatically otherwise!) randomInts = np.round(randomNums).astype(int) # Plot: axis = np.arange(start=min(randomInts), stop = max(randomInts) + 1) plt.hist(randomInts, bins = axis) 

Looks good no?

1 Comment

Instead of generating randomNums and rounding them to "ints" (actually, reals ending in .0), what about randomInts = np.random.normal(loc=10, scale=3, size=10000).astype(int)-10, which returns actual ints? Note, however, that it's necessary to generate values with a loc other than 0 (and return it to 0 by subtracting the loc), or you'll end up with far too many results at exactly 0.
3

Old question, new answer:

For a bell-shaped distribution on the integers {-10, -9, ..., 9, 10}, you can use the binomial distribution with n=20 and p=0.5, and subtract 10 from the samples.

For example,

In [167]: import numpy as np In [168]: import matplotlib.pyplot as plt In [169]: rng = np.random.default_rng() In [170]: N = 5000000 # Number of samples to generate In [171]: samples = rng.binomial(n=20, p=0.5, size=N) - 10 In [172]: samples.min(), samples.max() Out[172]: (-10, 10) 

Note that the probability of -10 or 10 is pretty low, so you won't necessarily see them in any given sample, especially if you use a smaller N.

np.bincount() is an efficient way to generate a histogram for a collection of small nonnegative integers:

In [173]: counts = np.bincount(samples + 10, minlength=20) In [174]: counts Out[174]: array([ 4, 104, 889, 5517, 22861, 73805, 184473, 369441, 599945, 800265, 881140, 801904, 600813, 370368, 185082, 73635, 23325, 5399, 931, 95, 4]) In [175]: plt.bar(np.arange(-10, 11), counts) Out[175]: <BarContainer object of 21 artists> 

histogram

2 Comments

I couldn't find rng imported, but it must be this: rng = np.random.default_rng(), which presumably stands for a "default random number generator". devdocs.io/numpy~1.20/reference/random/index#random-quick-start
Thanks for pointing that out, @Stefan_EOX. I updated the code to include the creation of rng.
1

This version is mathematically not correct (because you crop the bell) but will do the job quick and easily understandable if preciseness is not needed that much:

def draw_random_normal_int(low:int, high:int): # generate a random normal number (float) normal = np.random.normal(loc=0, scale=1, size=1) # clip to -3, 3 (where the bell with mean 0 and std 1 is very close to zero normal = -3 if normal < -3 else normal normal = 3 if normal > 3 else normal # scale range of 6 (-3..3) to range of low-high scaling_factor = (high-low) / 6 normal_scaled = normal * scaling_factor # center around mean of range of low high normal_scaled += low + (high-low)/2 # then round and return return np.round(normal_scaled) 

Drawing 100000 numbers results in this histogramm: enter image description here

Comments

1

I'm not sure if there (in scipy generator) is an option of var-type choice to be generated, but common generation can be such with scipy.stats

# Generate pseudodata from a single normal distribution import scipy from scipy import stats import numpy as np import matplotlib.pyplot as plt dist_mean = 0.0 dist_std = 0.5 n_events = 500 toy_data = scipy.stats.norm.rvs(dist_mean, dist_std, size=n_events) toy_data2 = [[i, j] for i, j in enumerate(toy_data )] arr = np.array(toy_data2) print("sample:\n", arr[1:500, 0]) print("bin:\n",arr[1:500, 1]) plt.scatter(arr[1:501, 1], arr[1:501, 0]) plt.xlabel("bin") plt.ylabel("sample") plt.show() 

or in such a way (also option of dtype choice is absent):

import matplotlib.pyplot as plt mu, sigma = 0, 0.1 # mean and standard deviation s = np.random.normal(mu, sigma, 500) count, bins, ignored = plt.hist(s, 30, density=True) plt.show() print(bins) # <<<<<<<<<< plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( - (bins - mu)**2 / (2 * sigma**2) ), linewidth=2, color='r') plt.show() 

without visualization the most common way (also no possibility to point out var-type)

bins = np.random.normal(3, 2.5, size=(10, 1)) 

a wrapper class could be done to instantiate the container with a given vars-dtype (e.g. by rounding floats to integers, as mentioned above)...

Comments

0

Here we start by getting values from the bell curve.

CODE:

#--------*---------*---------*---------*---------*---------*---------*---------* # Desc: Discretize a normal distribution centered at 0 #--------*---------*---------*---------*---------*---------*---------*---------* import sys import random from math import sqrt, pi import numpy as np import matplotlib.pyplot as plt def gaussian(x, var): k1 = np.power(x, 2) k2 = -k1/(2*var) return (1./(sqrt(2. * pi * var))) * np.exp(k2) #--------*---------*---------*---------*---------*---------*---------*---------# while 1:# M A I N L I N E # #--------*---------*---------*---------*---------*---------*---------*---------# # # probability density function # # for discrete normal RV pdf_DGV = [] pdf_DGW = [] var = 9 tot = 0 # # create 'rough' gaussian for i in range(-var - 1, var + 2): if i == -var - 1: r_pdf = + gaussian(i, 9) + gaussian(i - 1, 9) + gaussian(i - 2, 9) elif i == var + 1: r_pdf = + gaussian(i, 9) + gaussian(i + 1, 9) + gaussian(i + 2, 9) else: r_pdf = gaussian(i, 9) tot = tot + r_pdf pdf_DGV.append(i) pdf_DGW.append(r_pdf) print(i, r_pdf) # # amusing how close tot is to 1! print('\nRough total = ', tot) # # no need to normalize with Python 3.6, # # but can't help ourselves for i in range(0,len(pdf_DGW)): pdf_DGW[i] = pdf_DGW[i]/tot # # print out pdf weights # # for out discrte gaussian print('\npdf:\n') print(pdf_DGW) # # plot random variable action rv_samples = random.choices(pdf_DGV, pdf_DGW, k=10000) plt.hist(rv_samples, bins = 100) plt.show() sys.exit() 

OUTPUT:

-10 0.0007187932912256041 -9 0.001477282803979336 -8 0.003798662007932481 -7 0.008740629697903166 -6 0.017996988837729353 -5 0.03315904626424957 -4 0.05467002489199788 -3 0.0806569081730478 -2 0.10648266850745075 -1 0.12579440923099774 0 0.1329807601338109 1 0.12579440923099774 2 0.10648266850745075 3 0.0806569081730478 4 0.05467002489199788 5 0.03315904626424957 6 0.017996988837729353 7 0.008740629697903166 8 0.003798662007932481 9 0.001477282803979336 10 0.0007187932912256041 Rough total = 0.9999715875468381 pdf: [0.000718813714486599, 0.0014773247784004072, 0.003798769940305483, 0.008740878047691289, 0.017997500190860556, 0.033159988420867426, 0.05467157824565407, 0.08065919989878699, 0.10648569402724471, 0.12579798346031068, 0.13298453855078374, 0.12579798346031068, 0.10648569402724471, 0.08065919989878699, 0.05467157824565407, 0.033159988420867426, 0.017997500190860556, 0.008740878047691289, 0.003798769940305483, 0.0014773247784004072, 0.000718813714486599] 

enter image description here

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.