I'm looking for a reasonable definition of a function weighted_sample that does not return just one random index for a list of given weights (which would be something like
def weighted_choice(weights, random=random): """ Given a list of weights [w_0, w_1, ..., w_n-1], return an index i in range(n) with probability proportional to w_i. """ rnd = random.random() * sum(weights) for i, w in enumerate(weights): if w<0: raise ValueError("Negative weight encountered.") rnd -= w if rnd < 0: return i raise ValueError("Sum of weights is not positive") to give a categorical distribution with constant weights) but a random sample of k of those, without replacement, just as random.sample behaves compared to random.choice.
Just as weighted_choice can be written as
lambda weights: random.choice([val for val, cnt in enumerate(weights) for i in range(cnt)]) weighted_sample could be written as
lambda weights, k: random.sample([val for val, cnt in enumerate(weights) for i in range(cnt)], k) but I would like a solution that does not require me to unravel the weights into a (possibly huge) list.
Edit: If there are any nice algorithms that give me back a histogram/list of frequencies (in the same format as the argument weights) instead of a sequence of indices, that would also be very useful.
h[i].w = 0byh[i][2] -= 1and related things, this looks good.)