4

I encountered a naive algorithm for random number generation that produce a series of numbers as follows:

for (int i = 0; i < MAX; i++) if (rand.nextInt(100) >= 100 - probability) // probability is between 0 and 100 randomNumbersList.add(i); 

I was wondering if there's a way to achieve statistically equivalent results without iterating through each number between 0 and MAX.

3
  • 1
    First, this just creates a (possible empty) list of 'random' numbers between probablity and 100. Second, you're not really iterating over every number between 0 and MAX, the code tries MAX times to generate a number. If you want to do something X times there has to be a loop somewhere. Commented Jun 25, 2015 at 13:18
  • 3
    @Buurman not true - it creates a list of up to MAX strictly-increasing integers between 0 and MAX, where the average length of the list is given by MAX*(probability/100.0) Commented Jun 25, 2015 at 13:20
  • Whoops, sorry, you are correct @tucuxi. I missed that it adds i to the list. Thanks for the correction. Commented Jun 25, 2015 at 13:22

4 Answers 4

3

Let's denote p=probability/100 and q=1-p.

Consider what will be the first number to be added. With probability q it is 0; with probability (1-q)*q it is 1, with probability (1-q)^2*q it is 2 and so on. This is the geometric distribution. You can easily generate a random number distributed according to geometric distribution using the following approach: generate a random number u uniformly distributed in [0,1] and calculate x=⌊ln(u)/ln(q)⌋ — this x will have geometric distribution (see this question).

So this is how you can calculate the first number to add.

Now consider the difference between the second and the first numbers. It will also be distributed geometrically (only starting at 1, not at 0), so you can calculate this difference the same way and thus obtain the second number, and so on.

A pseudocode will be something like

cur = -1 lnq = ln(q) while true u = random(0,1) // float! cur = cur + 1 + floor(ln(u)/lnq) if cur >= MAX break randomNumbersList.add(cur); 

Corresponding Java code by @traveh

List<Integer> randomNumbersList = new LinkedList<Integer>(); int cur = -1; double p = probability / 100; double q = 1 - p; double lnq = Math.log(q); Random random = new Random(); while (true) { double u = random.nextDouble(); cur = cur + 1 + (int)Math.floor(Math.log(u) / lnq); if (cur >= MAX) break; randomNumbersList.add(cur); } 
Sign up to request clarification or add additional context in comments.

9 Comments

You can also pre-calculate 1/ln(q)
As a small nitpick, your method occasionally adds cur == MAX, while the OP's can, at most, add MAX-1
@traveh, I think you should judge it for yourself, maybe after a couple of experiments. This solution requires calculation of logarithm at each time step, and thus may be slower than the other solution. At the same time, it requires exactly one call to random per element generated, and thus may be faster. So I suggest to try on real data, or maybe even switch between the two methods depending on the value of probability.
I assume that natural logs are calculated quite cheaply via lookup - this can hardly be slower than TreeSet lookup, and also requires no memory access. If it is correct (and it certainly seems so), it is much faster than my answer.
Tested and works like a charm :) @tucuxi due to lack of time I won't test the performance differences at the moment (will update in the future if I get to it). I appreciate the effort from all of you and especially the fair play - upvoted your answers. You're awesome guys!
|
3

Your algorithm creates a list of up to MAX elements. Each element is an integer from 0 to MAX-1, with no duplicates. Since rand.nextInt(n) returns numbers x uniformly distributed between 0 and n such that 0 <= x < n, x >= 100-p should be always false if p == 0 (x is never 100), and always true if p == 100 (x is always >= 0). The expected number of elements is therefore MAX*(p/100.0).

This can be improved significantly if MAX is high but p is low: in most cases, you would be throwing your weighted coin, but it would come up tails and you would not add anything. Wasted work. If, however, p is high (say, over .5), then you will typically be generating on-the-order-of MAX elements; and it is unlikely that you can make things much faster (you should expect to do O(MAX) work to create O(MAX) random elements). If MAX is small, there's little difference between approaches - so I would stick to the simpler one: the one you already have.

Assuming large MAX and small p

We can model the length of the list using the well-known binomial distribution (as you are throwing MAX non-fair coins, which land "heads" with probability p). Java code is available in the Colt library. Using their classes, this should work:

 Binomial b = new Binomial(MAX, p, new MersenneTwister()); int heads = b.nextInt(); 

And now we need to generate "heads" sorted integers between 0 and MAX-1. Let us assume that MAX is much larger than heads. We can use

 TreeSet<Integer> chosen = new TreeSet<>(); for (int i=0, r=0; i<heads; i++) { do { r = random.nextInt(MAX) } while (chosen.contains(r)); chosen.add(r); } 

Note that this has horrible performance when p is high, because the inner loop will be executed several times; but for such cases, your initial algorithm is already good enough.

When p is low, the proposed algorithm will require time proportional to MAX*(p/100) instead of MAX. This should more than compensate for the cost of maintaining the TreeSet ordered.

1 Comment

Thanks for the verbose answer! I indeed forget to mention that MAX is large and p is small (around 1 percent or less). I will test this and return to confirm.
3

For every number, you are determining if the selection is successful (number is chosen) or not. Therefore, in MAX trials, the numbers you have is basically the number of successes. If you can predetermine the number of successes then you can get that many unique random numbers from the valid range. And it will be statistically same. So what you are looking for is binomial distribution. Get a random number from this distribution using the success probability and the number of trials (MAX). This will give you the count of your random numbers. Then get this many random unique numbers and it is done.

3 Comments

I don't think that "many random unique numbers" will work, I expect (though not checked) that, for example, the distribution of differences among the successive numbers will be different.
I disagree, @Petr - the probability of any given random integer between 0 and MAX being included is independent of whether any others have been included or not. The distribution of differences will be the same, as long as you preserve that independence.
@tucuxi, well, I think I agree
1

Java code for Petr's answer:

List<Integer> randomNumbersList = new LinkedList<Integer>(); int cur = -1; double p = probability / 100; double q = 1 - p; double lnq = Math.log(q); Random random = new Random(); while (true) { double u = random.nextDouble(); cur = cur + 1 + (int)Math.floor(Math.log(u) / lnq); if (cur >= MAX) break; randomNumbersList.add(cur); } 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.