How to generate a random normal distribution of integers

Question

How to generate a random integer as with np.random.randint(), but with a normal distribution around 0.

np.random.randint(-10, 10) returns integers with a discrete uniform distribution np.random.normal(0, 0.1, 1) returns floats with a normal distribution

What I want is a kind of combination between the two functions.

The normal distribution is continuous by definition, so the answer to this question depends on how you want to discretise it. One possible solution is to sample from np.random.normal and round the result to an integer. — Will Vousden
– Will Vousden, Commented May 24, 2016 at 11:07

Moot · Accepted Answer · 2021-02-06 09:25:20Z

One other way to get a discrete distribution that looks like the normal distribution is to draw from a multinomial distribution where the probabilities are calculated from a normal distribution.

import scipy.stats as ss import numpy as np import matplotlib.pyplot as plt x = np.arange(-10, 11) xU, xL = x + 0.5, x - 0.5 prob = ss.norm.cdf(xU, scale = 3) - ss.norm.cdf(xL, scale = 3) prob = prob / prob.sum() # normalize the probabilities so their sum is 1 nums = np.random.choice(x, size = 10000, p = prob) plt.hist(nums, bins = len(x))

Here, np.random.choice picks an integer from [-10, 10]. The probability for selecting an element, say 0, is calculated by p(-0.5 < x < 0.5) where x is a normal random variable with mean zero and standard deviation 3. I chose a std. dev. of 3 because this way p(-10 < x < 10) is almost 1.

The result looks like this:

For a continuous distribution, P(x=0)=0 (this is true for any other number). The probabiltiy is defined for intervals. Here, in order to associate a probability to 0 (and to other integers) I used the interval (-0.5, 0.5) It was basically because the question asked for integers. For 1, it is (0.5, 1.5).
It would work fine with these parameters but with a smaller standard deviation you would end up pdf's greater than 1, for example. It would be fine as long as you do the same normalization (dividing by the sum) afterwards but I didn't want to confuse potential readers since pdf is actually not probability (that's why it can be greater than 1) so I wanted to use actual probabilities.

Community · Accepted Answer · 2017-05-23 11:54:40Z

16

It may be possible to generate a similar distribution from a Truncated Normal Distribution that is rounded up to integers. Here's an example with scipy's truncnorm().

import numpy as np from scipy.stats import truncnorm import matplotlib.pyplot as plt scale = 3. range = 10 size = 100000 X = truncnorm(a=-range/scale, b=+range/scale, scale=scale).rvs(size=size) X = X.round().astype(int)

Let's see what it looks like

bins = 2 * range + 1 plt.hist(X, bins)

edited May 23, 2017 at 11:54

CommunityBot

11 silver badge

answered May 24, 2016 at 11:03

bakkal

55.6k12 gold badges136 silver badges113 bronze badges

5 Comments

Robert Lugg Over a year ago

I appreciate the responses from both @ayhan and bakkal. Please, I only ask this for my knowledge; I do not wish to insult either answer. Just looking at the plot, Bakkal's is more symmetric. They both seem sufficient and from the code seem equally valid. But my understanding is weak. Is there a objectively better method?

bakkal Over a year ago

@RobertLugg the relatively higher symmetry may be due to sample size being larger. That said, I think the code in this answer is simpler.

Rriskit Over a year ago

Be aware that you overwrite the python range function with this code. Try to use neutral variable names.

Pablo Over a year ago

While the code is simpler, it is also slower. In my test, about 100 times slower than ayhan's solution. Still, it takes 3 seconds to generate 10.000 numbers 50 times. So in many cases this will be ok.

bricklore Over a year ago

This answer is slightly wrong, as the first and last buckets will have lower probabilities than other buckets. This is because X.round() will do natural rounding with 0.5 as the cutoff-point, while all generated real values are in [a,b). To fix this, use np.floor(X).astype(int) instead, which will yield integer values in [a,b) with equal probabilities.

stellaria · Accepted Answer · 2024-10-16 13:35:23Z

The accepted answer here works, but I tried Will Vousden's solution and it works well too:

import numpy as np # Generate Distribution: randomNums = np.random.normal(scale=3, size=100000) # Round to the nearest integer and convert to integer (NB the latter is not done automatically otherwise!) randomInts = np.round(randomNums).astype(int) # Plot: axis = np.arange(start=min(randomInts), stop = max(randomInts) + 1) plt.hist(randomInts, bins = axis)

Instead of generating randomNums and rounding them to "ints" (actually, reals ending in .0), what about randomInts = np.random.normal(loc=10, scale=3, size=10000).astype(int)-10, which returns actual ints? Note, however, that it's necessary to generate values with a loc other than 0 (and return it to 0 by subtracting the loc), or you'll end up with far too many results at exactly 0.

Warren Weckesser · Accepted Answer · 2022-02-18 13:27:43Z

Old question, new answer:

For a bell-shaped distribution on the integers {-10, -9, ..., 9, 10}, you can use the binomial distribution with n=20 and p=0.5, and subtract 10 from the samples.

For example,

In [167]: import numpy as np In [168]: import matplotlib.pyplot as plt In [169]: rng = np.random.default_rng() In [170]: N = 5000000 # Number of samples to generate In [171]: samples = rng.binomial(n=20, p=0.5, size=N) - 10 In [172]: samples.min(), samples.max() Out[172]: (-10, 10)

Note that the probability of -10 or 10 is pretty low, so you won't necessarily see them in any given sample, especially if you use a smaller N.

np.bincount() is an efficient way to generate a histogram for a collection of small nonnegative integers:

In [173]: counts = np.bincount(samples + 10, minlength=20) In [174]: counts Out[174]: array([ 4, 104, 889, 5517, 22861, 73805, 184473, 369441, 599945, 800265, 881140, 801904, 600813, 370368, 185082, 73635, 23325, 5399, 931, 95, 4]) In [175]: plt.bar(np.arange(-10, 11), counts) Out[175]: <BarContainer object of 21 artists>

I couldn't find rng imported, but it must be this: rng = np.random.default_rng(), which presumably stands for a "default random number generator". devdocs.io/numpy~1.20/reference/random/index#random-quick-start
Thanks for pointing that out, @Stefan_EOX. I updated the code to include the creation of rng.

Maikefer · Accepted Answer · 2021-10-27 11:43:51Z

This version is mathematically not correct (because you crop the bell) but will do the job quick and easily understandable if preciseness is not needed that much:

def draw_random_normal_int(low:int, high:int): # generate a random normal number (float) normal = np.random.normal(loc=0, scale=1, size=1) # clip to -3, 3 (where the bell with mean 0 and std 1 is very close to zero normal = -3 if normal < -3 else normal normal = 3 if normal > 3 else normal # scale range of 6 (-3..3) to range of low-high scaling_factor = (high-low) / 6 normal_scaled = normal * scaling_factor # center around mean of range of low high normal_scaled += low + (high-low)/2 # then round and return return np.round(normal_scaled)

Drawing 100000 numbers results in this histogramm:

JeeyCi · Accepted Answer · 2022-03-18 13:49:11Z

I'm not sure if there (in scipy generator) is an option of var-type choice to be generated, but common generation can be such with scipy.stats

# Generate pseudodata from a single normal distribution import scipy from scipy import stats import numpy as np import matplotlib.pyplot as plt dist_mean = 0.0 dist_std = 0.5 n_events = 500 toy_data = scipy.stats.norm.rvs(dist_mean, dist_std, size=n_events) toy_data2 = [[i, j] for i, j in enumerate(toy_data )] arr = np.array(toy_data2) print("sample:\n", arr[1:500, 0]) print("bin:\n",arr[1:500, 1]) plt.scatter(arr[1:501, 1], arr[1:501, 0]) plt.xlabel("bin") plt.ylabel("sample") plt.show()

or in such a way (also option of dtype choice is absent):

import matplotlib.pyplot as plt mu, sigma = 0, 0.1 # mean and standard deviation s = np.random.normal(mu, sigma, 500) count, bins, ignored = plt.hist(s, 30, density=True) plt.show() print(bins) # <<<<<<<<<< plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( - (bins - mu)**2 / (2 * sigma**2) ), linewidth=2, color='r') plt.show()

without visualization the most common way (also no possibility to point out var-type)

bins = np.random.normal(3, 2.5, size=(10, 1))

a wrapper class could be done to instantiate the container with a given vars-dtype (e.g. by rounding floats to integers, as mentioned above)...

CopyPasteIt · Accepted Answer · 2018-04-02 14:31:49Z

Here we start by getting values from the bell curve.

CODE:

#--------*---------*---------*---------*---------*---------*---------*---------* # Desc: Discretize a normal distribution centered at 0 #--------*---------*---------*---------*---------*---------*---------*---------* import sys import random from math import sqrt, pi import numpy as np import matplotlib.pyplot as plt def gaussian(x, var): k1 = np.power(x, 2) k2 = -k1/(2*var) return (1./(sqrt(2. * pi * var))) * np.exp(k2) #--------*---------*---------*---------*---------*---------*---------*---------# while 1:# M A I N L I N E # #--------*---------*---------*---------*---------*---------*---------*---------# # # probability density function # # for discrete normal RV pdf_DGV = [] pdf_DGW = [] var = 9 tot = 0 # # create 'rough' gaussian for i in range(-var - 1, var + 2): if i == -var - 1: r_pdf = + gaussian(i, 9) + gaussian(i - 1, 9) + gaussian(i - 2, 9) elif i == var + 1: r_pdf = + gaussian(i, 9) + gaussian(i + 1, 9) + gaussian(i + 2, 9) else: r_pdf = gaussian(i, 9) tot = tot + r_pdf pdf_DGV.append(i) pdf_DGW.append(r_pdf) print(i, r_pdf) # # amusing how close tot is to 1! print('\nRough total = ', tot) # # no need to normalize with Python 3.6, # # but can't help ourselves for i in range(0,len(pdf_DGW)): pdf_DGW[i] = pdf_DGW[i]/tot # # print out pdf weights # # for out discrte gaussian print('\npdf:\n') print(pdf_DGW) # # plot random variable action rv_samples = random.choices(pdf_DGV, pdf_DGW, k=10000) plt.hist(rv_samples, bins = 100) plt.show() sys.exit()

OUTPUT:

-10 0.0007187932912256041 -9 0.001477282803979336 -8 0.003798662007932481 -7 0.008740629697903166 -6 0.017996988837729353 -5 0.03315904626424957 -4 0.05467002489199788 -3 0.0806569081730478 -2 0.10648266850745075 -1 0.12579440923099774 0 0.1329807601338109 1 0.12579440923099774 2 0.10648266850745075 3 0.0806569081730478 4 0.05467002489199788 5 0.03315904626424957 6 0.017996988837729353 7 0.008740629697903166 8 0.003798662007932481 9 0.001477282803979336 10 0.0007187932912256041 Rough total = 0.9999715875468381 pdf: [0.000718813714486599, 0.0014773247784004072, 0.003798769940305483, 0.008740878047691289, 0.017997500190860556, 0.033159988420867426, 0.05467157824565407, 0.08065919989878699, 0.10648569402724471, 0.12579798346031068, 0.13298453855078374, 0.12579798346031068, 0.10648569402724471, 0.08065919989878699, 0.05467157824565407, 0.033159988420867426, 0.017997500190860556, 0.008740878047691289, 0.003798769940305483, 0.0014773247784004072, 0.000718813714486599]

Collectives™ on Stack Overflow

How to generate a random normal distribution of integers

7 Answers 7

4 Comments

5 Comments

1 Comment

2 Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

4 Comments

5 Comments

1 Comment

2 Comments

Comments

Comments

Comments

Linked

Related