Efficiently choosing x different elements in a list at random

Question

I have a list:

a = [1,2,1,1,3,5,6,2]

I want to select, say 3 elements at random from this list, but they must all be different.

I need to preserve the 'weight' of each element, so sampling from set(a) is not possible.

So far, my solution is:

while condition == False: mysample = random.sample(a, 3) if len(set(mysample)) - len(mysample) !=0: condition = False else: condition = True

But this forces me to re-sample as many times as it takes for the elements to all be different. This works fine for small sampling, but for large sampling, my code becomes very inefficient...

It can return duplicates if you give it a list with duplicates. The OP wants a list with no duplicates but that is still weighted such that more common elements are more likely to appear than less common ones. — John Kugelman
– John Kugelman, Commented Mar 3, 2015 at 21:51
@aruisdante random.sample(a, 3) yielded [1, 1, 6] on my machine. It returned two of the three 1s in the list. — John Kugelman
– John Kugelman, Commented Mar 3, 2015 at 21:52
sample 1, then remove that all occurences of that in the original list. Repeat until you have the required number of elements. Have a look here: stackoverflow.com/questions/1157106/… — Calum
– Calum, Commented Mar 3, 2015 at 21:53
@JohnKugelman Ah wait, they do say later that it's uniqueness based on index, not value. They don't highlight that well. My bad. — aruisdante
– aruisdante, Commented Mar 3, 2015 at 21:53

John Kugelman · Accepted Answer · 2015-03-03 22:02:36Z

3

You can shuffle and take the first three non repeated elements:

import random random.shuffle(your_list) three_elements = set() for v in your_list: if len(three_elements) == 3: break three_elements.add(v)

edited Mar 3, 2015 at 22:02

John Kugelman

364k70 gold badges555 silver badges600 bronze badges

answered Mar 3, 2015 at 21:48

Luís Bianchin

2,5061 gold badge30 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

F. Evlangeli Over a year ago

Most efficient (and elegant!) for small lists.

Padraic Cunningham · Accepted Answer · 2015-03-03 22:54:54Z

l = [] seen = set() while len(l) < 3: ch = choice(a) if ch not in seen: l.append(ch) seen.add(ch) print(l)

Depending on the ratio of actual different numbers to elements different approaches will have different advantages:

In [7]: a = [choice(range(10000)) for _ in range(100000)] In [6]: import random In [7]: a = [choice(range(10000)) for _ in range(100000)] In [8]: %%timeit random.shuffle(a) three_elements = set() for v in a: if len(three_elements) == 5000: break if not v in three_elements: three_elements.add(v) ...: 10 loops, best of 3: 36.5 ms per loop In [9]: %%timeit l = [] seen = set() while len(l) < 5000: ch = choice(a) if ch not in seen: l.append(ch) seen.add(ch) ...: 100 loops, best of 3: 5.16 ms per loop

Running your code after 10 mins I had to exit the process as it was still going so whatever you choose will be a major improvement.

Using shuffle would be more efficient if you had a greater ratio of repeats to actual items in the list and you wanted a very large sample size, otherwise the cost of shuffling will make it less efficient than simply using a set and choice,

Calum · Accepted Answer · 2015-03-03 22:01:34Z

while count < sampleSize: # where sampeSize is the number of values you want s = random.sample(a, 1) filter(lambda x: x != s, a) mysample.append(s) count += 1

Timothy Shields · Accepted Answer · 2015-03-03 23:09:35Z

This is probably more complicated than necessary, but here is an implementation that uses a modified version of reservoir sampling.

import itertools import random def element_at(iterable, index, default=None): return next(itertools.islice(iterable, index, None), default) def sample_unique(iterable, size): S = set() for index, item in enumerate(iterable): if len(S) < size: S.add(item) else: r = random.randint(0, index) if r < size: other = element_at(S, r) if item not in S: S.remove(other) S.add(item) return S

Collectives™ on Stack Overflow

Efficiently choosing x different elements in a list at random

4 Answers 4

1 Comment

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

Comments

Linked

Related