How do I sample few numbers with different distributions?

Question

Let say I have my own custom distribution numpy vector called p.

Then p satisfies the following:

np.ndim(p) == 1 & np.sum(p) == 1 & np.all(p >= 0)

With that vector I can easily sample a number in [0, p.shape) with np.random.choice(np.arange(len(p)), p=p)

In a case I have many such ps, I have a matrix (with dim 2) P that satisfies:

np.sum(P[:,i]) == 1 # for all i in P.shape[1] np.all(P >= 0)

Then I wish to sample P.shape[1] numbers in the range 0 to P.shape[0] with probability P.

For example the next code:

P = np.array([[0.2, 0.3], [0.5, 0.7], [0.3, 0]]) x = np.random.choice(np.arange(P.shape[0], P[:,0])) y = np.random.choice(np.arange(P.shape[0], P[:,1]))

will produce my will (x=0 in 0.2, x=1 in 0.5 and x=2 in 0.3 and y=0 in 0.3, y=1 in 0.7).

In my case P has many columns and I wish to sample all in one shot.

Of course I can do it in a for loop, for example:

random_values = np.empty(P.shape[1]) arange_arr = np.arange(P.shape[0]) for i in range(P.shape[1]): random_values[i] = np.random.choice(arange_arr, p=P[:,i])

Trying to find some nupmy-scipy elegant way to do it.

Mad Physicist · Accepted Answer · 2021-01-26 18:43:27Z

You could do something like this:

P = np.array([[0.2, 0.3], [0.5, 0.7], [0.3, 0]]) P_upper = np.cumsum(P, axis=0) P_lower = np.concatenate((np.zeros((1, P.shape[1])), P_upper[:-1, :]), axis=0)

This creates a set of bins that you can digitize into. Now generate random numbers between 0 and 1:

r = np.random.rand(10, P.shape[1])

There are a couple of ways to assign the data to the right bins. The quick and relatively inefficient way is to use a boolean mask:

mask = (r[None, ...] >= P_lower[:, None, :]) & (r[None, ...] < P_upper[:, None, :]) result = np.argmax(mask, axis=0)

A more efficient, but more complicated, way is to add an offset to each column, and apply np.digitize or np.searchsorted to the result:

offset = np.arange(P.shape[1]) ind = np.searchsorted((P_upper + offset).ravel('F'), (r + offset).ravel('F')).reshape(r.shape, order='F') result = ind - offset * P.shape[0]

TL;DR

def multi_sample(p, n): ps = np.cumsum(p, axis=0) r = np.random.rand(n, ps.shape[1]) offset = np.arange(P.shape[1]) ind = np.searchsorted((P_upper + offset).ravel('F'), (r + offset).ravel('F')).reshape(r.shape, order='F') return ind - offset * P.shape[0]

Thank you, I hoped for some built-in function in mupy/scipy.. =\
@Shaq. You can make your own utility function. This should not be an issue. I'll add a TL;DR
thanks. I think TL;DR might be better in the start of the answer, doesn't it?

Collectives™ on Stack Overflow

How do I sample few numbers with different distributions?

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related