1
$\begingroup$

I'm trying to understand the computational complexity of this pseudocode:

values is a set of n unique elements subset is an empty set for 0 ... k X: randomly select a value from values if value is in subset goto X else insert value into subset 

This is a (poor) algorithm for selecting a unique random subset of k elements from n, and I'm aware of the better choices, but I wanted to understand the computational complexity of this. I was told it was O(n!).

I can see easily that this is O(n) when duplicates are allowed because the conditional test is eliminated from the pseudocode and k choices are made each time.

When you have to account for duplicates, there is a probability that a re-test will be required which increases with each iteration. Depending on the values of n and k, this is a non-negligible fact, but I'm not certain how it affects the big-O complexity in a generalized way. Could someone explain this to me?

$\endgroup$
2
  • 1
    $\begingroup$ Maybe this would be better suited for Stack Overflow? $\endgroup$ Commented Jan 7, 2016 at 22:02
  • $\begingroup$ You realize this can run indefinitely if you're unlucky? What would the big-O notation mean then? $\endgroup$ Commented Jan 8, 2016 at 0:39

1 Answer 1

1
$\begingroup$

Let $E_{n,k}(s)$ be the expected number of steps for the algorithm to finish when there are $s$ more elements left to insert into the subset, given $n$ elements to choose from, and a desired subset size $k$. Note that $E_{n,k}(0) = 0$, our stopping condition. We want to know $E_{n,k}(k)$.

At any given step, there are $s$ free spots in our subset, and therefore $k-s$ elements already in the subset. That means there is a $\frac{k-s}{n}$ chance of picking an element we already have, and needing to start the process again. Otherwise, we insert the element and recurse into the next state (i.e. with one fewer free spots in the subset).

Then:

$$E_{n,k}(s) = 1 + \frac{k-s}{n}E_{n,k}(s) +\frac{n-k+s}{n}E_{n,k}(s-1)$$

$$E_{n,k}(s) = \frac{n}{n - k + s} +E_{n,k}(s-1)$$

Therefore:

$$E_{n,k}(k) = n\sum_{s=1}^{k} \frac{1}{n - k + s} = n \cdot (H_n- H_{n-k})$$

where $H_n$ is the $n$th Harmonic number.

$H_n - H_{n-k} \approx \log(\frac{n}{n-k})$, implying that $E_{n,k}$ takes about $O(n \log(\frac{n}{n-k})))$ steps in the average case.

The best-case time-complexity is $O(k)$, and the worst-case is $O(\infty)$.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.