A weighted version of random.choice

Question

I needed to write a weighted version of random.choice (each element in the list has a different probability for being selected). This is what I came up with:

def weightedChoice(choices): """Like random.choice, but each element can have a different chance of being selected. choices can be any iterable containing iterables with two items each. Technically, they can have more than two items, the rest will just be ignored. The first item is the thing being chosen, the second item is its weight. The weights can be any numeric values, what matters is the relative differences between them. """ space = {} current = 0 for choice, weight in choices: if weight > 0: space[current] = choice current += weight rand = random.uniform(0, current) for key in sorted(space.keys() + [current]): if rand < key: return choice choice = space[key] return None

This function seems overly complex to me, and ugly. I'm hoping everyone here can offer some suggestions on improving it or alternate ways of doing this. Efficiency isn't as important to me as code cleanliness and readability.

Jacques Kvam · Accepted Answer · 2020-04-27 07:43:44Z

434

Since version 1.7.0, NumPy has a choice function that supports probability distributions.

from numpy.random import choice draw = choice(list_of_candidates, number_of_items_to_pick, p=probability_distribution)

Note that probability_distribution is a sequence in the same order of list_of_candidates. You can also use the keyword replace=False to change the behavior so that drawn items are not replaced.

edited Apr 27, 2020 at 7:43

Jacques Kvam

3,1162 gold badges30 silver badges34 bronze badges

answered Oct 4, 2014 at 18:56

Ronan Paixão

8,7952 gold badges33 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

jpmc26 Over a year ago

By my testing, this is an order of magnitude slower than random.choices for individual calls. If you need a lot of random results, it's really important to pick them all at once by adjusting number_of_items_to_pick. If you do so, it's an order of magnitude faster.

xjcl Over a year ago

This doesn't work with tuples etc ("ValueError: a must be 1-dimensional"), so in that case one can ask numpy to pick the index into the list, i.e. len(list_of_candidates), and then do list_of_candidates[draw]

Jitin Over a year ago

Now you got choices method in the random module

Safwan Over a year ago

Document says choices() uses floating point arithmetic for increasing speed and choice() uses integer arithmetic for reducing bias. This might be the reason behind choices() being a faster option compared to choice()

Karl Knechtel Over a year ago

How does it behave if weights are given and replace=False? I'm not sure that this problem is well-defined, i.e. what the relative probabilities of results ought to be.

Alan W. Smith · Accepted Answer · 2021-05-18 15:02:07Z

Since Python 3.6 there is a method choices from the random module.

In [1]: import random In [2]: random.choices( ...: population=[['a','b'], ['b','a'], ['c','b']], ...: weights=[0.2, 0.2, 0.6], ...: k=10 ...: ) Out[2]: [['c', 'b'], ['c', 'b'], ['b', 'a'], ['c', 'b'], ['c', 'b'], ['b', 'a'], ['c', 'b'], ['b', 'a'], ['c', 'b'], ['c', 'b']]

Note that random.choices will sample with replacement, per the docs:

Return a k sized list of elements chosen from the population with replacement.

Note for completeness of answer:

When a sampling unit is drawn from a finite population and is returned to that population, after its characteristic(s) have been recorded, before the next unit is drawn, the sampling is said to be "with replacement". It basically means each element may be chosen more than once.

If you need to sample without replacement, then as @ronan-paixão's brilliant answer states, you can use numpy.choice, whose replace argument controls such behaviour.

This is so much faster than numpy.random.choice . Picking from a list of 8 weighted items 10,000 times, numpy.random.choice took 0.3286 sec where as random.choices took 0.0416 sec, about 8x faster.
@AntonCodes This example is cherry picked. numpy is going to have some constant-time overhead that random.choices doesn't, so of course it's slower on a miniscule list of 8 items, and if you're choosing 10k times from such a list, you're right. But for cases when the list is larger (depending on how you're testing, I see break points between 100-300 elements), np.random.choice begins outperforming random.choices by a fairly wide gap. For example, including the normalization step along with the numpy call, I get a nearly 4x speedup over random.choices for a list of 10k elements.
This should be the new answer based on the performance improvement that @AntonCodes reported.
It would probably better this answer to not have the population be a list of lists, which briefly confused me. Just a simple list of strings or ints would be a fine illustration and be a better MWE, imo

moooeeeep · Accepted Answer · 2015-11-11 10:41:36Z

146

def weighted_choice(choices): total = sum(w for c, w in choices) r = random.uniform(0, total) upto = 0 for c, w in choices: if upto + w >= r: return c upto += w assert False, "Shouldn't get here"

edited Nov 11, 2015 at 10:41

moooeeeep

32.6k26 gold badges109 silver badges196 bronze badges

answered Sep 9, 2010 at 19:08

Ned Batchelder

378k77 gold badges583 silver badges675 bronze badges

9 Comments

knite Over a year ago

You can drop an operation and save a sliver of time by reversing the statements inside the for loop: upto +=w; if upto > r

JnBrymn Over a year ago

save a variable by deleting upto and just decrementing r by the weight each time. The comparison is then if r < 0

moooeeeep Over a year ago

@JnBrymn You need to check r <= 0. Consider an input set of 1 items, and a roll of 1.0. The assertion will fail then. I corrected that error in the answer.

Ned Batchelder Over a year ago

@Sardathrion you could use a pragma to mark the for loop as partial: # pragma: no branch

Anton Panchishin Over a year ago

@mLstudent33 I dont use Udacity.

|

Raymond Hettinger · Accepted Answer · 2024-09-07 01:42:41Z

Updated answer

The Python standard library now has random.choices() which directly supports weighted selections.

Here's an example from the docs:

>>> # Six roulette wheel spins (weighted sampling with replacement) >>> choices(['red', 'black', 'green'], [18, 18, 2], k=6) ['red', 'green', 'black', 'black', 'red', 'black']

Algorithm overview:

Arrange the weights into cumulative distribution.
Use random.random() to pick a random float 0.0 <= x < total.
Search the distribution using bisect.bisect() as shown in the example at http://docs.python.org/dev/library/bisect.html#other-examples.

Simplified code:

from random import random from math import floor as floor from bisect import bisect as bisect from itertools import accumulate, repeat def choices(population, weights=None, *, cum_weights=None, k=1): """Return a k sized list of population elements chosen with replacement. If the relative weights or cumulative weights are not specified, the selections are made with equal probability. """ n = len(population) if cum_weights is None: if weights is None: return [population[floor(random() * n)] for i in repeat(None, k)] cum_weights = list(accumulate(weights)) elif weights is not None: raise TypeError('Cannot specify both weights and cumulative weights') total = cum_weights[-1] + 0.0 hi = n - 1 return [population[bisect(cum_weights, random() * total, 0, hi)] for i in repeat(None, k)]

This is more efficient than Ned's answer. Basically, instead of doing a linear (O(n)) search through the choices, he's doing a binary search (O(log n)). +1!
This still runs in O(n) because of the cumulative distribution calculation.
This solution is better in the case where multiple calls to weighted_choice are needed for the same set of choices. In that case you can create the cumulative sum once and do a binary search on each call.
@JonVaughan random() can't return 1.0. Per the docs, it returns a result in the half-open interval [0.0, 1.0), which is to say that it can return exactly 0.0, but can't return exactly 1.0. The largest value it can return is 0.99999999999999988897769753748434595763683319091796875 (which Python prints as 0.9999999999999999, and is the largest 64-bit float less than 1).

hellpanderr · Accepted Answer · 2022-02-02 11:58:01Z

If you don't mind using numpy, you can use numpy.random.choice.

For example:

import numpy items = [["item1", 0.2], ["item2", 0.3], ["item3", 0.45], ["item4", 0.05] elems = [i[0] for i in items] probs = [i[1] for i in items] trials = 1000 results = [0] * len(items) for i in range(trials): res = numpy.random.choice(items, p=probs) #This is where the item is selected! results[items.index(res)] += 1 results = [r / float(trials) for r in results] print "item\texpected\tactual" for i in range(len(probs)): print "%s\t%0.4f\t%0.4f" % (items[i], probs[i], results[i])

If you know how many selections you need to make in advance, you can do it without a loop like this:

numpy.random.choice(items, trials, p=probs)

Nickil Maveli · Accepted Answer · 2017-01-10 09:12:03Z

As of Python v3.6, random.choices could be used to return a list of elements of specified size from the given population with optional weights.

random.choices(population, weights=None, *, cum_weights=None, k=1)

population : list containing unique observations. (If empty, raises IndexError)
weights : More precisely relative weights required to make selections.
cum_weights : cumulative weights required to make selections.
k : size(len) of the list to be outputted. (Default len()=1)

Few Caveats:

1) It makes use of weighted sampling with replacement so the drawn items would be later replaced. The values in the weights sequence in itself do not matter, but their relative ratio does.

Unlike np.random.choice which can only take on probabilities as weights and also which must ensure summation of individual probabilities upto 1 criteria, there are no such regulations here. As long as they belong to numeric types (int/float/fraction except Decimal type) , these would still perform.

>>> import random # weights being integers >>> random.choices(["white", "green", "red"], [12, 12, 4], k=10) ['green', 'red', 'green', 'white', 'white', 'white', 'green', 'white', 'red', 'white'] # weights being floats >>> random.choices(["white", "green", "red"], [.12, .12, .04], k=10) ['white', 'white', 'green', 'green', 'red', 'red', 'white', 'green', 'white', 'green'] # weights being fractions >>> random.choices(["white", "green", "red"], [12/100, 12/100, 4/100], k=10) ['green', 'green', 'white', 'red', 'green', 'red', 'white', 'green', 'green', 'green']

2) If neither weights nor cum_weights are specified, selections are made with equal probability. If a weights sequence is supplied, it must be the same length as the population sequence.

Specifying both weights and cum_weights raises a TypeError.

>>> random.choices(["white", "green", "red"], k=10) ['white', 'white', 'green', 'red', 'red', 'red', 'white', 'white', 'white', 'green']

3) cum_weights are typically a result of itertools.accumulate function which are really handy in such situations.

_{From the documentation linked:}

Internally, the relative weights are converted to cumulative weights before making selections, so supplying the cumulative weights saves work.

So, either supplying weights=[12, 12, 4] or cum_weights=[12, 24, 28] for our contrived case produces the same outcome and the latter seems to be more faster / efficient.

PaulMcG · Accepted Answer · 2012-12-04 22:20:58Z

Crude, but may be sufficient:

import random weighted_choice = lambda s : random.choice(sum(([v]*wt for v,wt in s),[]))

Does it work?

# define choices and relative weights choices = [("WHITE",90), ("RED",8), ("GREEN",2)] # initialize tally dict tally = dict.fromkeys(choices, 0) # tally up 1000 weighted choices for i in xrange(1000): tally[weighted_choice(choices)] += 1 print tally.items()

Prints:

[('WHITE', 904), ('GREEN', 22), ('RED', 74)]

Assumes that all weights are integers. They don't have to add up to 100, I just did that to make the test results easier to interpret. (If weights are floating point numbers, multiply them all by 10 repeatedly until all weights >= 1.)

weights = [.6, .2, .001, .199] while any(w < 1.0 for w in weights): weights = [w*10 for w in weights] weights = map(int, weights)

Nice, I'm not sure I can assume all weights are integers, though.
Seems like your objects would be duplicated in this example. That'd be inefficient (and so is the function for converting weights to integers). Nevertheless, this solution is a good one-liner if the integer weights are small.
Primitives will be duplicated, but objects will only have references duplicated, not the objects themselves. (this is why you can't create a list of lists using [[]]*10 - all the elements in the outer list point to the same list.
@PaulMcG No; nothing but references will ever be duplicated. Python's type system has no concept of primitives. You can confirm that even with e.g. an int you're still getting lots of references to the same object by doing something like [id(x) for x in ([99**99] * 100)] and observe that id returns the same memory address on every call.

Maxime · Accepted Answer · 2012-05-18 15:49:08Z

If you have a weighted dictionary instead of a list you can write this

items = { "a": 10, "b": 5, "c": 1 } random.choice([k for k in items for dummy in range(items[k])])

Note that [k for k in items for dummy in range(items[k])] produces this list ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'c', 'b', 'b', 'b', 'b', 'b']

This works for small total population values, but not for large datasets (e.g. US population by state would end up creating a working list with 300 million items in it).
@Ryan Indeed. It also doesn't work for non-integer weights, which are another realistic scenario (e.g. if you have your weights expressed as probabilities of selection).

Raymond Hettinger · Accepted Answer · 2016-11-30 07:40:44Z

Here's is the version that is being included in the standard library for Python 3.6:

import itertools as _itertools import bisect as _bisect class Random36(random.Random): "Show the code included in the Python 3.6 version of the Random class" def choices(self, population, weights=None, *, cum_weights=None, k=1): """Return a k sized list of population elements chosen with replacement. If the relative weights or cumulative weights are not specified, the selections are made with equal probability. """ random = self.random if cum_weights is None: if weights is None: _int = int total = len(population) return [population[_int(random() * total)] for i in range(k)] cum_weights = list(_itertools.accumulate(weights)) elif weights is not None: raise TypeError('Cannot specify both weights and cumulative weights') if len(cum_weights) != len(population): raise ValueError('The number of weights does not match the population') bisect = _bisect.bisect total = cum_weights[-1] return [population[bisect(cum_weights, random() * total)] for i in range(k)]

Source: https://hg.python.org/cpython/file/tip/Lib/random.py#l340

Ea Werner · Accepted Answer · 2020-08-25 10:44:20Z

A very basic and easy approach for a weighted choice is the following:

np.random.choice(['A', 'B', 'C'], p=[0.3, 0.4, 0.3])

whi · Accepted Answer · 2013-12-11 16:38:41Z

5

import numpy as np w=np.array([ 0.4, 0.8, 1.6, 0.8, 0.4]) np.random.choice(w, p=w/sum(w))

answered Dec 11, 2013 at 16:38

whi

2,7606 gold badges35 silver badges41 bronze badges

Comments

Aku · Accepted Answer · 2015-01-27 21:55:52Z

I'm probably too late to contribute anything useful, but here's a simple, short, and very efficient snippet:

def choose_index(probabilies): cmf = probabilies[0] choice = random.random() for k in xrange(len(probabilies)): if choice <= cmf: return k else: cmf += probabilies[k+1]

No need to sort your probabilities or create a vector with your cmf, and it terminates once it finds its choice. Memory: O(1), time: O(N), with average running time ~ N/2.

If you have weights, simply add one line:

def choose_index(weights): probabilities = weights / sum(weights) cmf = probabilies[0] choice = random.random() for k in xrange(len(probabilies)): if choice <= cmf: return k else: cmf += probabilies[k+1]

Several things are wrong with this. Superficially, there are some typoed variable names and there's no rationale given for using this over, say, np.random.choice. But more interestingly, there's a failure mode where this raises an exception. Doing probabilities = weights / sum(weights) doesn't guarantee that probabilities will sum to 1; for instance, if weights is [1,1,1,1,1,1,1] then probabilities will only sum to 0.9999999999999998, smaller than the largest possible return value of random.random (which is 0.9999999999999999). Then choice <= cmf is never be satisfied.

Community · Accepted Answer · 2017-05-23 12:34:45Z

If your list of weighted choices is relatively static, and you want frequent sampling, you can do one O(N) preprocessing step, and then do the selection in O(1), using the functions in this related answer.

# run only when `choices` changes. preprocessed_data = prep(weight for _,weight in choices) # O(1) selection value = choices[sample(preprocessed_data)][0]

personal_cloud · Accepted Answer · 2020-04-09 00:51:13Z

If you happen to have Python 3, and are afraid of installing numpy or writing your own loops, you could do:

import itertools, bisect, random def weighted_choice(choices): weights = list(zip(*choices))[1] return choices[bisect.bisect(list(itertools.accumulate(weights)), random.uniform(0, sum(weights)))][0]

Because you can build anything out of a bag of plumbing adaptors! Although... I must admit that Ned's answer, while slightly longer, is easier to understand.

Tony Veijalainen · Accepted Answer · 2011-04-08 18:33:51Z

I looked the pointed other thread and came up with this variation in my coding style, this returns the index of choice for purpose of tallying, but it is simple to return the string ( commented return alternative):

import random import bisect try: range = xrange except: pass def weighted_choice(choices): total, cumulative = 0, [] for c,w in choices: total += w cumulative.append((total, c)) r = random.uniform(0, total) # return index return bisect.bisect(cumulative, (r,)) # return item string #return choices[bisect.bisect(cumulative, (r,))][0] # define choices and relative weights choices = [("WHITE",90), ("RED",8), ("GREEN",2)] tally = [0 for item in choices] n = 100000 # tally up n weighted choices for i in range(n): tally[weighted_choice(choices)] += 1 print([t/sum(tally)*100 for t in tally])

Mark · Accepted Answer · 2013-06-09 15:26:44Z

A general solution:

import random def weighted_choice(choices, weights): total = sum(weights) treshold = random.uniform(0, total) for k, weight in enumerate(weights): total -= weight if total < treshold: return choices[k]

murphsp1 · Accepted Answer · 2013-11-04 03:33:10Z

Here is another version of weighted_choice that uses numpy. Pass in the weights vector and it will return an array of 0's containing a 1 indicating which bin was chosen. The code defaults to just making a single draw but you can pass in the number of draws to be made and the counts per bin drawn will be returned.

If the weights vector does not sum to 1, it will be normalized so that it does.

import numpy as np def weighted_choice(weights, n=1): if np.sum(weights)!=1: weights = weights/np.sum(weights) draws = np.random.random_sample(size=n) weights = np.cumsum(weights) weights = np.insert(weights,0,0.0) counts = np.histogram(draws, bins=weights) return(counts[0])

mLstudent33 · Accepted Answer · 2021-10-18 03:06:57Z

There is lecture on this by Sebastien Thurn in the free Udacity course AI for Robotics. Basically he makes a circular array of the indexed weights using the mod operator %, sets a variable beta to 0, randomly chooses an index, for loops through N where N is the number of indices and in the for loop firstly increments beta by the formula:

beta = beta + uniform sample from {0...2* Weight_max}

and then nested in the for loop, a while loop per below:

while w[index] < beta: beta = beta - w[index] index = index + 1 select p[index]

Then on to the next index to resample based on the probabilities (or normalized probability in the case presented in the course).

On Udacity find Lesson 8, video number 21 of Artificial Intelligence for Robotics where he is lecturing on particle filters.

Uppinder Chugh · Accepted Answer · 2017-11-06 10:29:03Z

It depends on how many times you want to sample the distribution.

Suppose you want to sample the distribution K times. Then, the time complexity using np.random.choice() each time is O(K(n + log(n))) when n is the number of items in the distribution.

In my case, I needed to sample the same distribution multiple times of the order of 10^3 where n is of the order of 10^6. I used the below code, which precomputes the cumulative distribution and samples it in O(log(n)). Overall time complexity is O(n+K*log(n)).

import numpy as np n,k = 10**6,10**3 # Create dummy distribution a = np.array([i+1 for i in range(n)]) p = np.array([1.0/n]*n) cfd = p.cumsum() for _ in range(k): x = np.random.uniform() idx = cfd.searchsorted(x, side='right') sampled_element = a[idx]

Mark Amery · Accepted Answer · 2019-11-24 18:40:17Z

Another way of doing this, assuming we have weights at the same index as the elements in the element array.

import numpy as np weights = [0.1, 0.3, 0.5] #weights for the item at index 0,1,2 # sum of weights should be <=1, you can also divide each weight by sum of all weights to standardise it to <=1 constraint. trials = 1 #number of trials num_item = 1 #number of items that can be picked in each trial selected_item_arr = np.random.multinomial(num_item, weights, trials) # gives number of times an item was selected at a particular index # this assumes selection with replacement # one possible output # selected_item_arr # array([[0, 0, 1]]) # say if trials = 5, the the possible output could be # selected_item_arr # array([[1, 0, 0], # [0, 0, 1], # [0, 0, 1], # [0, 1, 0], # [0, 0, 1]])

Now let's assume, we have to sample out 3 items in 1 trial. You can assume that there are three balls R,G,B present in large quantity in ratio of their weights given by weight array, the following could be possible outcome:

num_item = 3 trials = 1 selected_item_arr = np.random.multinomial(num_item, weights, trials) # selected_item_arr can give output like : # array([[1, 0, 2]])

you can also think number of items to be selected as number of binomial/ multinomial trials within a set. So, the above example can be still work as

num_binomial_trial = 5 weights = [0.1,0.9] #say an unfair coin weights for H/T num_experiment_set = 1 selected_item_arr = np.random.multinomial(num_binomial_trial, weights, num_experiment_set) # possible output # selected_item_arr # array([[1, 4]]) # i.e H came 1 time and T came 4 times in 5 binomial trials. And one set contains 5 binomial trails.

Bharadwaz Valicharla · Accepted Answer · 2022-11-29 07:35:14Z

let's say you have

items = [11, 23, 43, 91] probability = [0.2, 0.3, 0.4, 0.1]

and you have function which generates a random number between [0, 1) (we can use random.random() here). so now take the prefix sum of probability

prefix_probability=[0.2,0.5,0.9,1]

now we can just take a random number between 0-1 and use binary search to find where that number belongs in prefix_probability. that index will be your answer

Code will go something like this

return items[bisect.bisect(prefix_probability,random.random())]

Perennial · Accepted Answer · 2017-11-22 17:22:51Z

One way is to randomize on the total of all the weights and then use the values as the limit points for each var. Here is a crude implementation as a generator.

def rand_weighted(weights): """ Generator which uses the weights to generate a weighted random values """ sum_weights = sum(weights.values()) cum_weights = {} current_weight = 0 for key, value in sorted(weights.iteritems()): current_weight += value cum_weights[key] = current_weight while True: sel = int(random.uniform(0, 1) * sum_weights) for key, value in sorted(cum_weights.iteritems()): if sel < value: break yield key

blue note · Accepted Answer · 2018-08-31 13:07:59Z

0

Using numpy

def choice(items, weights): return items[np.argmin((np.cumsum(weights) / sum(weights)) < np.random.rand())]

answered Aug 31, 2018 at 13:07

blue note

29.5k10 gold badges83 silver badges110 bronze badges

1 Comment

Mark Amery Over a year ago

NumPy already has np.random.choice, as mentioned in the accepted answer that's been here since 2014. What's the point of rolling your own?

Stas Baskin · Accepted Answer · 2018-10-23 12:30:39Z

I needed to do something like this really fast really simple, from searching for ideas i finally built this template. The idea is receive the weighted values in a form of a json from the api, which here is simulated by the dict.

Then translate it into a list in which each value repeats proportionally to it's weight, and just use random.choice to select a value from the list.

I tried it running with 10, 100 and 1000 iterations. The distribution seems pretty solid.

def weighted_choice(weighted_dict): """Input example: dict(apples=60, oranges=30, pineapples=10)""" weight_list = [] for key in weighted_dict.keys(): weight_list += [key] * weighted_dict[key] return random.choice(weight_list)

ML_Dev · Accepted Answer · 2019-07-31 21:53:38Z

I didn't love the syntax of any of those. I really wanted to just specify what the items were and what the weighting of each was. I realize I could have used random.choices but instead I quickly wrote the class below.

import random, string from numpy import cumsum class randomChoiceWithProportions: ''' Accepts a dictionary of choices as keys and weights as values. Example if you want a unfair dice: choiceWeightDic = {"1":0.16666666666666666, "2": 0.16666666666666666, "3": 0.16666666666666666 , "4": 0.16666666666666666, "5": .06666666666666666, "6": 0.26666666666666666} dice = randomChoiceWithProportions(choiceWeightDic) samples = [] for i in range(100000): samples.append(dice.sample()) # Should be close to .26666 samples.count("6")/len(samples) # Should be close to .16666 samples.count("1")/len(samples) ''' def __init__(self, choiceWeightDic): self.choiceWeightDic = choiceWeightDic weightSum = sum(self.choiceWeightDic.values()) assert weightSum == 1, 'Weights sum to ' + str(weightSum) + ', not 1.' self.valWeightDict = self._compute_valWeights() def _compute_valWeights(self): valWeights = list(cumsum(list(self.choiceWeightDic.values()))) valWeightDict = dict(zip(list(self.choiceWeightDic.keys()), valWeights)) return valWeightDict def sample(self): num = random.uniform(0,1) for key, val in self.valWeightDict.items(): if val >= num: return key

DocOc · Accepted Answer · 2019-10-02 18:37:14Z

Provide random.choice() with a pre-weighted list:

Solution & Test:

import random options = ['a', 'b', 'c', 'd'] weights = [1, 2, 5, 2] weighted_options = [[opt]*wgt for opt, wgt in zip(options, weights)] weighted_options = [opt for sublist in weighted_options for opt in sublist] print(weighted_options) # test counts = {c: 0 for c in options} for x in range(10000): counts[random.choice(weighted_options)] += 1 for opt, wgt in zip(options, weights): wgt_r = counts[opt] / 10000 * sum(weights) print(opt, counts[opt], wgt, wgt_r)

Output:

['a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'd', 'd'] a 1025 1 1.025 b 1948 2 1.948 c 5019 5 5.019 d 2008 2 2.008

NeStack · Accepted Answer · 2022-03-24 12:06:16Z

In case you don't define in advance how many items you want to pick (so, you don't do something like k=10) and you just have probabilities, you can do the below. Note that your probabilities do not need to add up to 1, they can be independent of each other:

soup_items = ['pepper', 'onion', 'tomato', 'celery'] items_probability = [0.2, 0.3, 0.9, 0.1] selected_items = [item for item,p in zip(soup_items,items_probability) if random.random()<p] print(selected_items) >>>['pepper','tomato']

iperov · Accepted Answer · 2024-08-21 16:19:48Z

In Machine learning I need to not only randomly select an item from an array, but also make sure that the item is selected a stable number of times in a full round.

The idea is to duplicate each index N times according to its chance of occurrence. The minimum chance is 0.001, so if there is an item with a chance of 0.001, an index with a chance of 1.0 will be duplicated 1000 times.

So I made a Choicer class. It also supports nested Choicer.

from __future__ import annotations from typing import Any, Sequence import numpy as np class Choicer: def __init__(self, items : Sequence[ Any|Choicer ], probs : Sequence[int|float]|np.ndarray ): """ probs [ 0.001 .. 1.0 ] """ self._items = items self._probs = probs = np.array(probs, np.float32).clip(0.001, 1.0) if len(probs) != len(items): raise ValueError('must len(probs) == len(items)') # how often each item will occur rates = (probs/probs.min()).astype(np.int32) # base idx sequence, for example Choicer(['a', 'b', 'c'], [1,1,0.5]) , idxs_base == [0,0,1,1,2] self._idxs_base = np.concatenate([np.full( (x,), i, dtype=np.uint32) for i,x in enumerate(rates)], 0) self._idxs = None self._idx_counter = 0 @property def items(self) -> Sequence[ Any|Choicer ]: return self._items @property def probs(self) -> np.ndarray: return self._probs def pick(self, count : int) -> Sequence[Any]: """pick `count` items""" out = [] if len(self._items) != 0: while len(out) < count: if self._idx_counter == 0: self._idxs = self._idxs_base.copy() np.random.shuffle(self._idxs) self._idx_counter = len(self._idxs) self._idx_counter -= 1 idx = self._idxs[self._idx_counter] item = self._items[idx] if isinstance(item, Choicer): item = item.pick(1)[0] out.append(item) return out

Example:

c = Choicer(['a', 'b', 'c'], [1,1,0.5]) print( c.pick(5) ) # ['c', 'a', 'b', 'a', 'b'] print( c.pick(5) ) # ['a', 'a', 'b', 'b', 'c'] print( c.pick(5) ) # ['a', 'c', 'a', 'b', 'b']

Example of nested Choicer:

c = Choicer(['a', 'b', Choicer(['c0','c1','c2'], [1,1,1]), ], [1,1,0.5]) print( c.pick(15) ) # ['b', 'a', 'b', 'c0', 'a', 'b', 'a', 'b', 'c1', 'a', 'b', 'c2', 'a', 'a', 'b']

Konstantin Burlachenko · Accepted Answer · 2022-05-12 00:19:32Z

Step-1: Generate CDF F in which you're interesting

Step-2: Generate u.r.v. u

Step-3: Evaluate z=F^{-1}(u)

This modeling is described in course of probability theory or stochastic processes. This is applicable just because you have easy CDF.

Collectives™ on Stack Overflow

A weighted version of random.choice

29 Answers 29

5 Comments

4 Comments

9 Comments

Updated answer

6 Comments

Comments

Comments

4 Comments

2 Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

29 Answers 29

5 Comments

4 Comments

9 Comments

Updated answer

6 Comments

Comments

Comments

4 Comments

2 Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Related