Exhaust list of elements randomly without sorting them randomly first

Question

If I have a list of 10K elements, and I want to randomly iterate through all of them, is there an algorithm that lets me access each element randomly, without just sorting them randomly first?

In other words, this would not be ideal:

const sorted = list .map(v => [math.random(), v]) .sort((a,b) => a[0]- b[0]);

It would be nice to avoid the sort call and the mapping call. My only idea would be to store everything in a hashmap and access the hash keys randomly somehow? Although that's just coming back to the same problem, afaict.

Sorting algorithms worst case is O(n^2), trying to avoid that if list is long, that's all — Alexander Mills
– Alexander Mills, Commented Nov 8, 2022 at 1:52
I think the term you're after is the "Fisher-Yates shuffle", e.g. stackoverflow.com/a/2450976/1358308. this is only O(n) so should be faster — Sam Mason
– Sam Mason, Commented Nov 8, 2022 at 11:55

Sam Mason · Accepted Answer · 2022-11-09 11:52:36Z

Just been having a play with this and realised that the Fisher-Yates shuffle works well "on-line". For example, if you've got a large list you don't need to spend the time to shuffle the whole thing before you start iterating over items, or, equivalently, you might only need a few items out of a large list.

I didn't see a language tag in the question, so I'll pick Python.

from random import randint def iterrand(a): """Iterate over items of a list in a random order. Additional items can be .append()ed arbitrarily at runtime.""" for i, ai in enumerate(a): j = randint(i, len(a)-1) a[i], a[j] = a[j], ai yield a[i]

This is O(n) in the length of the list and by allowing .append()s (O(1) in Python) the list can be built in the background.

An example use would be:

l = [0, 1, 2] for i, v in enumerate(iterrand(l)): print(f"{i:3}: {v:<5} {l}") if v < 4: l.append(randint(1, 9))

which might produce output like:

 0: 2 [2, 1, 0] 1: 3 [2, 3, 0, 1] 2: 1 [2, 3, 1, 1, 0] 3: 0 [2, 3, 1, 0, 1, 3] 4: 1 [2, 3, 1, 0, 1, 3, 7] 5: 7 [2, 3, 1, 0, 1, 7, 7, 3] 6: 7 [2, 3, 1, 0, 1, 7, 7, 3] 7: 3 [2, 3, 1, 0, 1, 7, 7, 3] 8: 2 [2, 3, 1, 0, 1, 7, 7, 3, 2] 9: 3 [2, 3, 1, 0, 1, 7, 7, 3, 2, 3] 10: 2 [2, 3, 1, 0, 1, 7, 7, 3, 2, 3, 2] 11: 7 [2, 3, 1, 0, 1, 7, 7, 3, 2, 3, 2, 7]

Update: To test correctness, I'd do something like:

# trivial tests assert list(iterrand([])) == [] assert list(iterrand([1])) == [1] # bigger uniformity test from collections import Counter # tally 1M draws c = Counter() for _ in range(10**6): c[tuple(iterrand([1, 2, 3, 4, 5]))] += 1 # ensure it's uniform assert all(7945 < v < 8728 for v in c.values()) # above constants calculated in R via: # k<-120;p<-0.001/k;qbinom(c(p,1-p), 1e6, 1/k))

@AlexanderMills see the wikipedia link, i.e. the first one. it's also kind of obvious from the code no? single loop doing constant work. or am I missing something obvious?
@SamMason Why not just use python’s own shuffle() method, which should be faster than a native python implementation of FY? (Since native functions are implemented in C).
@AlexanderMills searching for "Yates-Fisher proof" returns lots of relevant things on the correctness of the algorithm, e.g. here. when testing the implementation, I'd do some trivial unit tests and a simple uniformity test
@AlexanderMills Unbiasedness of Fisher-Yates is a straightforward application of conditional probability. It’s the same argument as to why drawing straws is fair.

Alexander Mills · Accepted Answer · 2022-11-09 18:08:15Z

Fisher-Yates should do the trick as good as any, this article is really good: https://medium.com/@oldwestaction/randomness-is-hard-e085decbcbb2

the relevant JS code is very short and sweet:

const fisherYatesShuffle = (deck) => { for (let i = deck.length - 1; i >= 0; i--) { const swapIndex = Math.floor(Math.random() * (i + 1)); [deck[i], deck[swapIndex]] = [deck[swapIndex], deck[i]]; } return deck }

to yield results as you go, so you don't have to iterate through the list twice, use generator function like so:

const fisherYatesShuffle = function* (deck) { for (let i = deck.length - 1; i >= 0; i--) { const swapIndex = Math.floor(Math.random() * (i + 1)); // * use ; [deck[i], deck[swapIndex]] = [deck[swapIndex], deck[i]]; yield deck[i]; } };

(note don't forget some of those semi-colons, when the next line is bracket notation).

I found Mike Bostock's animation to be a nice visual explanation, via my originally linked answer

Collectives™ on Stack Overflow

Exhaust list of elements randomly without sorting them randomly first

2 Answers 2

10 Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

10 Comments

1 Comment

Linked

Related