0

If I have a list of 10K elements, and I want to randomly iterate through all of them, is there an algorithm that lets me access each element randomly, without just sorting them randomly first?

In other words, this would not be ideal:

const sorted = list .map(v => [math.random(), v]) .sort((a,b) => a[0]- b[0]); 

It would be nice to avoid the sort call and the mapping call. My only idea would be to store everything in a hashmap and access the hash keys randomly somehow? Although that's just coming back to the same problem, afaict.

2
  • Sorting algorithms worst case is O(n^2), trying to avoid that if list is long, that's all Commented Nov 8, 2022 at 1:52
  • 1
    I think the term you're after is the "Fisher-Yates shuffle", e.g. stackoverflow.com/a/2450976/1358308. this is only O(n) so should be faster Commented Nov 8, 2022 at 11:55

2 Answers 2

1

Just been having a play with this and realised that the Fisher-Yates shuffle works well "on-line". For example, if you've got a large list you don't need to spend the time to shuffle the whole thing before you start iterating over items, or, equivalently, you might only need a few items out of a large list.

I didn't see a language tag in the question, so I'll pick Python.

from random import randint def iterrand(a): """Iterate over items of a list in a random order. Additional items can be .append()ed arbitrarily at runtime.""" for i, ai in enumerate(a): j = randint(i, len(a)-1) a[i], a[j] = a[j], ai yield a[i] 

This is O(n) in the length of the list and by allowing .append()s (O(1) in Python) the list can be built in the background.

An example use would be:

l = [0, 1, 2] for i, v in enumerate(iterrand(l)): print(f"{i:3}: {v:<5} {l}") if v < 4: l.append(randint(1, 9)) 

which might produce output like:

 0: 2 [2, 1, 0] 1: 3 [2, 3, 0, 1] 2: 1 [2, 3, 1, 1, 0] 3: 0 [2, 3, 1, 0, 1, 3] 4: 1 [2, 3, 1, 0, 1, 3, 7] 5: 7 [2, 3, 1, 0, 1, 7, 7, 3] 6: 7 [2, 3, 1, 0, 1, 7, 7, 3] 7: 3 [2, 3, 1, 0, 1, 7, 7, 3] 8: 2 [2, 3, 1, 0, 1, 7, 7, 3, 2] 9: 3 [2, 3, 1, 0, 1, 7, 7, 3, 2, 3] 10: 2 [2, 3, 1, 0, 1, 7, 7, 3, 2, 3, 2] 11: 7 [2, 3, 1, 0, 1, 7, 7, 3, 2, 3, 2, 7] 

Update: To test correctness, I'd do something like:

# trivial tests assert list(iterrand([])) == [] assert list(iterrand([1])) == [1] # bigger uniformity test from collections import Counter # tally 1M draws c = Counter() for _ in range(10**6): c[tuple(iterrand([1, 2, 3, 4, 5]))] += 1 # ensure it's uniform assert all(7945 < v < 8728 for v in c.values()) # above constants calculated in R via: # k<-120;p<-0.001/k;qbinom(c(p,1-p), 1e6, 1/k)) 
Sign up to request clarification or add additional context in comments.

10 Comments

fisher yates is o(n) hmmm link?
@AlexanderMills see the wikipedia link, i.e. the first one. it's also kind of obvious from the code no? single loop doing constant work. or am I missing something obvious?
@SamMason Why not just use python’s own shuffle() method, which should be faster than a native python implementation of FY? (Since native functions are implemented in C).
@AlexanderMills searching for "Yates-Fisher proof" returns lots of relevant things on the correctness of the algorithm, e.g. here. when testing the implementation, I'd do some trivial unit tests and a simple uniformity test
@AlexanderMills Unbiasedness of Fisher-Yates is a straightforward application of conditional probability. It’s the same argument as to why drawing straws is fair.
|
1

Fisher-Yates should do the trick as good as any, this article is really good: https://medium.com/@oldwestaction/randomness-is-hard-e085decbcbb2

the relevant JS code is very short and sweet:

const fisherYatesShuffle = (deck) => { for (let i = deck.length - 1; i >= 0; i--) { const swapIndex = Math.floor(Math.random() * (i + 1)); [deck[i], deck[swapIndex]] = [deck[swapIndex], deck[i]]; } return deck } 

to yield results as you go, so you don't have to iterate through the list twice, use generator function like so:

const fisherYatesShuffle = function* (deck) { for (let i = deck.length - 1; i >= 0; i--) { const swapIndex = Math.floor(Math.random() * (i + 1)); // * use ; [deck[i], deck[swapIndex]] = [deck[swapIndex], deck[i]]; yield deck[i]; } }; 

(note don't forget some of those semi-colons, when the next line is bracket notation).

1 Comment

I found Mike Bostock's animation to be a nice visual explanation, via my originally linked answer

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.