Big-O complexity of calculation: Drawing a non-colliding subset of k elements from n total

Question

I'm trying to understand the computational complexity of this pseudocode:

values is a set of n unique elements subset is an empty set for 0 ... k X: randomly select a value from values if value is in subset goto X else insert value into subset

This is a (poor) algorithm for selecting a unique random subset of k elements from n, and I'm aware of the better choices, but I wanted to understand the computational complexity of this. I was told it was O(n!).

I can see easily that this is O(n) when duplicates are allowed because the conditional test is eliminated from the pseudocode and k choices are made each time.

When you have to account for duplicates, there is a probability that a re-test will be required which increases with each iteration. Depending on the values of n and k, this is a non-negligible fact, but I'm not certain how it affects the big-O complexity in a generalized way. Could someone explain this to me?

You realize this can run indefinitely if you're unlucky? What would the big-O notation mean then? — SSF
– SSF, Commented Jan 8, 2016 at 0:39

Marcus Andrews · Accepted Answer · 2016-01-08 02:17:12Z

Let $E_{n,k}(s)$ be the expected number of steps for the algorithm to finish when there are $s$ more elements left to insert into the subset, given $n$ elements to choose from, and a desired subset size $k$. Note that $E_{n,k}(0) = 0$, our stopping condition. We want to know $E_{n,k}(k)$.

At any given step, there are $s$ free spots in our subset, and therefore $k-s$ elements already in the subset. That means there is a $\frac{k-s}{n}$ chance of picking an element we already have, and needing to start the process again. Otherwise, we insert the element and recurse into the next state (i.e. with one fewer free spots in the subset).

Then:

$$E_{n,k}(s) = 1 + \frac{k-s}{n}E_{n,k}(s) +\frac{n-k+s}{n}E_{n,k}(s-1)$$

$$E_{n,k}(s) = \frac{n}{n - k + s} +E_{n,k}(s-1)$$

Therefore:

$$E_{n,k}(k) = n\sum_{s=1}^{k} \frac{1}{n - k + s} = n \cdot (H_n- H_{n-k})$$

where $H_n$ is the $n$th Harmonic number.

$H_n - H_{n-k} \approx \log(\frac{n}{n-k})$, implying that $E_{n,k}$ takes about $O(n \log(\frac{n}{n-k})))$ steps in the average case.

The best-case time-complexity is $O(k)$, and the worst-case is $O(\infty)$.

Stack Exchange Network

Big-O complexity of calculation: Drawing a non-colliding subset of k elements from n total

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Big-O complexity of calculation: Drawing a non-colliding subset of k elements from n total

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions