2

I have a list:

a = [1,2,1,1,3,5,6,2] 

I want to select, say 3 elements at random from this list, but they must all be different.

I need to preserve the 'weight' of each element, so sampling from set(a) is not possible.

So far, my solution is:

while condition == False: mysample = random.sample(a, 3) if len(set(mysample)) - len(mysample) !=0: condition = False else: condition = True 

But this forces me to re-sample as many times as it takes for the elements to all be different. This works fine for small sampling, but for large sampling, my code becomes very inefficient...

4
  • It can return duplicates if you give it a list with duplicates. The OP wants a list with no duplicates but that is still weighted such that more common elements are more likely to appear than less common ones. Commented Mar 3, 2015 at 21:51
  • @aruisdante random.sample(a, 3) yielded [1, 1, 6] on my machine. It returned two of the three 1s in the list. Commented Mar 3, 2015 at 21:52
  • 1
    sample 1, then remove that all occurences of that in the original list. Repeat until you have the required number of elements. Have a look here: stackoverflow.com/questions/1157106/… Commented Mar 3, 2015 at 21:53
  • @JohnKugelman Ah wait, they do say later that it's uniqueness based on index, not value. They don't highlight that well. My bad. Commented Mar 3, 2015 at 21:53

4 Answers 4

3

You can shuffle and take the first three non repeated elements:

import random random.shuffle(your_list) three_elements = set() for v in your_list: if len(three_elements) == 3: break three_elements.add(v) 
Sign up to request clarification or add additional context in comments.

1 Comment

Most efficient (and elegant!) for small lists.
1
l = [] seen = set() while len(l) < 3: ch = choice(a) if ch not in seen: l.append(ch) seen.add(ch) print(l) 

Depending on the ratio of actual different numbers to elements different approaches will have different advantages:

In [7]: a = [choice(range(10000)) for _ in range(100000)] In [6]: import random In [7]: a = [choice(range(10000)) for _ in range(100000)] In [8]: %%timeit random.shuffle(a) three_elements = set() for v in a: if len(three_elements) == 5000: break if not v in three_elements: three_elements.add(v) ...: 10 loops, best of 3: 36.5 ms per loop In [9]: %%timeit l = [] seen = set() while len(l) < 5000: ch = choice(a) if ch not in seen: l.append(ch) seen.add(ch) ...: 100 loops, best of 3: 5.16 ms per loop 

Running your code after 10 mins I had to exit the process as it was still going so whatever you choose will be a major improvement.

Using shuffle would be more efficient if you had a greater ratio of repeats to actual items in the list and you wanted a very large sample size, otherwise the cost of shuffling will make it less efficient than simply using a set and choice,

Comments

0
while count < sampleSize: # where sampeSize is the number of values you want s = random.sample(a, 1) filter(lambda x: x != s, a) mysample.append(s) count += 1 

Comments

0

This is probably more complicated than necessary, but here is an implementation that uses a modified version of reservoir sampling.

import itertools import random def element_at(iterable, index, default=None): return next(itertools.islice(iterable, index, None), default) def sample_unique(iterable, size): S = set() for index, item in enumerate(iterable): if len(S) < size: S.add(item) else: r = random.randint(0, index) if r < size: other = element_at(S, r) if item not in S: S.remove(other) S.add(item) return S 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.