1

Let say I have my own custom distribution numpy vector called p.

Then p satisfies the following:

np.ndim(p) == 1 & np.sum(p) == 1 & np.all(p >= 0) 

With that vector I can easily sample a number in [0, p.shape) with np.random.choice(np.arange(len(p)), p=p)

In a case I have many such ps, I have a matrix (with dim 2) P that satisfies:

np.sum(P[:,i]) == 1 # for all i in P.shape[1] np.all(P >= 0) 

Then I wish to sample P.shape[1] numbers in the range 0 to P.shape[0] with probability P.

For example the next code:

P = np.array([[0.2, 0.3], [0.5, 0.7], [0.3, 0]]) x = np.random.choice(np.arange(P.shape[0], P[:,0])) y = np.random.choice(np.arange(P.shape[0], P[:,1])) 

will produce my will (x=0 in 0.2, x=1 in 0.5 and x=2 in 0.3 and y=0 in 0.3, y=1 in 0.7).

In my case P has many columns and I wish to sample all in one shot.

Of course I can do it in a for loop, for example:

random_values = np.empty(P.shape[1]) arange_arr = np.arange(P.shape[0]) for i in range(P.shape[1]): random_values[i] = np.random.choice(arange_arr, p=P[:,i]) 

Trying to find some nupmy-scipy elegant way to do it.

1 Answer 1

1

You could do something like this:

P = np.array([[0.2, 0.3], [0.5, 0.7], [0.3, 0]]) P_upper = np.cumsum(P, axis=0) P_lower = np.concatenate((np.zeros((1, P.shape[1])), P_upper[:-1, :]), axis=0) 

This creates a set of bins that you can digitize into. Now generate random numbers between 0 and 1:

r = np.random.rand(10, P.shape[1]) 

There are a couple of ways to assign the data to the right bins. The quick and relatively inefficient way is to use a boolean mask:

mask = (r[None, ...] >= P_lower[:, None, :]) & (r[None, ...] < P_upper[:, None, :]) result = np.argmax(mask, axis=0) 

A more efficient, but more complicated, way is to add an offset to each column, and apply np.digitize or np.searchsorted to the result:

offset = np.arange(P.shape[1]) ind = np.searchsorted((P_upper + offset).ravel('F'), (r + offset).ravel('F')).reshape(r.shape, order='F') result = ind - offset * P.shape[0] 

TL;DR

def multi_sample(p, n): ps = np.cumsum(p, axis=0) r = np.random.rand(n, ps.shape[1]) offset = np.arange(P.shape[1]) ind = np.searchsorted((P_upper + offset).ravel('F'), (r + offset).ravel('F')).reshape(r.shape, order='F') return ind - offset * P.shape[0] 
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you, I hoped for some built-in function in mupy/scipy.. =\
@Shaq. You can make your own utility function. This should not be an issue. I'll add a TL;DR
thanks. I think TL;DR might be better in the start of the answer, doesn't it?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.