3

What is the distribution (uniform, poisson, normal, etc.) that is generated if I did the below? The output appears to indicate a uniform distribution. But then, why do we need std::uniform_int_distribution?

int main() { std::mt19937_64 generator(134); std::map<int, int> freq; const int size = 100000; for (int i = 0; i < size; ++i) { int r = generator() % size; freq[r]++; } for (auto f : freq) { std::cout << std::string(f.second, '*') << std::endl; } return 0; } 

Thanks!

4 Answers 4

2

Because while generator() is an uniform distribution over [generator.min(), generator.max()], generator() % n is not a uniform distribution over [0, n) (unless generator.max() is an exact multiple of n, assuming generator.min() == 0).

Let's take an example: min() == 0, max() == 65'535 and n == 7.

gen() will give numbers in the range [0, 65'535] and in this range there are:

  • 9'363 numbers such that gen() % 7 == 0
  • 9'363 numbers such that gen() % 7 == 1
  • 9'362 numbers such that gen() % 7 == 2
  • 9'362 numbers such that gen() % 7 == 3
  • 9'362 numbers such that gen() % 7 == 4
  • 9'362 numbers such that gen() % 7 == 5
  • 9'362 numbers such that gen() % 7 == 6

If you are wondering where did I get these numbers think of it like this: 65'534 is an exact multiple of 7 (65'534 = 7 * 9'362). This means that in [0, 65'533] there are exactly 9'362 numbers who map to each of the {0, 1, 2, 3, 4, 5, 6} by doing gen() % 7. This leaves 65'534 who maps to 0 and 65'535 who maps to 1

So you see there is a bias towards [0, 1] than to [2, 6], i.e.

  • 0 and 1 have a slightly higher chance (9'363 / 65'536 ≈ 14.28680419921875 %)‬ of appearing than
  • 2, 3, 4, 5 and 6 (9'362 / 65'536 ≈ 14.2852783203125‬ %).

std::uniformn_distribution doesn't have this problem and uses some mathematical woodo with possibly getting more random numbers from the generator to achieve a truly uniform distribution.

Sign up to request clarification or add additional context in comments.

2 Comments

Your answer is correct but generator::max() is typically one less than a power of 2, not an exact power of 2. Might be a bit confusing to someone trying to understand the workings of the internal source of std::random_distribution .
@doug good point. I will update the answer when I get the time
2

The random engine std::mt19937_64 outputs a 64-bit number that behaves like a uniformly distributed random number. Each of the C++ random engines (including those of the std::mersenne_twister_engine family) outputs a uniformly-distributed pseudorandom number of a specific size using a specific algorithm.

Specifically, std::mersenne_twister_engine meets the RandomNumberEngine requirement, which in turn meets the UniformRandomBitGenerator requirement; therefore, std::mersenne_twister_engine outputs bits that behave like uniformly-distributed random bits.

On the other hand, std::uniform_int_distribution is useful for transforming numbers from random engines into random integers of a user-defined range (say, from 0 through 10). But note that uniform_int_distribution and other distributions (unlike random number engines) can be implemented differently from one C++ standard library implementation to another.

Comments

1

std::mt19937_64 generates a pseudo-random mutually independent sequence of long long / unsigned long long numbers. It is supposed to be uniform but I don't know the exact details of the engine, though, it is one of the best discovered engines thus far.

By taking % n you get an approximation to pseudo-random uniform distribution over integers [0, ... ,n] - but it is inherently inaccurate. Certain numbers have slightly higher chance to occur while others have slightly lower chance depending on n. E.g., since 2^64 = 18446744073709551616 so with n=10000 first 1616 values have a slightly higher chance to occur than the last 10000-1616 values. std::uniform_distribution takes care of the inaccuracy by taking a new random number in very rare cases: say, if the number is above 18446744073709550000 for n=10000 take a new number - it would work. Though, concrete details are up to implementation.

Comments

1

One of the major accomplishments of <random> was the separation of distributions from engines.

I see it as similar to Alexander Stepanov's STL, which separated algorithms from containers through the use of iterators. For random numbers I can do an implementation of the Blum-Blum-Shub single bit generator (engine) and it will still work with all the distributions in <random>. Or, I can do a simple Linear Congruential Generator, x_{n + 1} = a * x_{n} % m, which when correctly seeded can never generate 0. Again, it will work with all the distributions. Likewise, I can write a new distribution and I don't have to worry about the peculiarities of any engine as long as I only use the interface specified by a UniformRandomBitGenerator.

In general, you should always use a distribution. Also, it is time to retire using '%' for generating random numbers.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.