14

My code reliably finds a precise (similar value every time) but in the region of 3.24 and 3.26. Competent mathematicians will note this around 0.1 away from an ideal Pi value.

My code revolves around picking random points on a 10,000 x 10,000 grid. The hypotenuse with respect to (0,0) is calculated and if this hypotenuse is less than 10,000, the point is inside a theoretical circle with radius 10,000.

The ratio of points inside and all points multiplied by 4 should give an estimation for Pi.

#include <stdio.h> #include <stdlib.h> #include <math.h> #include <time.h> int RandomNum(int min, int max); int main() { int d = 0; int n = 0; srand(time(NULL)); for (int i =0; i < 10000; i++) { n++; int xI = RandomNum(-10000, 10000); //double xF = (double) xI/10000; printf("\nx: %f", xI); int yI = RandomNum(-10000, 10000); //double yF = (double) yI/10000; printf("\ny: %f", yI); double pythag = sqrt(xI*xI+yI*yI); printf("\nhyp: %f\n\n",pythag); if (pythag < 10000) { d++; } } printf("d = %d\n", d); printf("n = %d\n", n); double DNratio = (double) d/n; double PiEstimate = DNratio * 4; printf("Pi Estimate: %f", PiEstimate); return 0; } int RandomNum(int min, int max) { int r = rand()%(max - min + 1) + min; return r; } //https://www.geeksforgeeks.org/c/generating-random-number-range-c/ //https://www.tutorialspoint.com/c_standard_library/c_function_srand.htm 
16
  • 1
    works in the compiler explorer: godbolt.org/z/j6jvxhE6T (I commented out the spamy printfs) Commented Oct 16 at 7:37
  • 5
    Might be a bias in the PRNG used by rand(), so then it depends on which C library is being used. On the site mentioned by @mch, I get bad results with MSVC. Commented Oct 16 at 7:44
  • 1
    @PaulPalmpje, (double)d / n is a floating-point division, not integer. Commented Oct 16 at 7:56
  • 9
    @TobySpeight: Re “In any case rand() % N has a slight bias”: rand() % N has a huge bias. For OP’s case with RAND_MAX = 32,767 and N = 20,001, each value in [0, 12,766] is produced twice as often as each value in [12,767, 20,000]. Commented Oct 16 at 9:15
  • 2
    sqrt is a potential source of error, and is not needed. You want to check x^2 + y^2 < r^2 instead. Commented Oct 16 at 11:02

4 Answers 4

25

The behavior you're seeing is most likely the result of a modulo bias, due to the small RAND_MAX value that is being used by the specific C library/compiler's implementation of the rand() function.

I get similar results to yours when testing with MSVC (x86 msvc v19.latest), where the RAND_MAX value is just 32,767. Using a modulo 10,000 operation (% 10000) on the output of rand() will generate "random" numbers with a very significant bias towards the lower numbers.(1) This causes more points to fall within your circle than you would normally expect, and results in an inflated estimated value of π.(2)

To demonstrate this: if we pick the random values between 0 and 32,767 and compare the hypotenuse with that value, we eliminate the modulo bias altogether, and the results much more closely approach the value of π as can be seen here:(3)

#include <stdio.h> #include <stdlib.h> #include <math.h> #include <time.h> int main() { int d = 0; int n = 0; srand(time(NULL)); for (int i =0; i < 10000; i++) { n++; int xI = rand(); int yI = rand(); double pythag = sqrt(xI*xI + yI*yI); if (pythag <= RAND_MAX) { d++; } } printf("d = %d\n", d); printf("n = %d\n", n); double DNratio = ((double) d)/n; double PiEstimate = DNratio * 4; printf("Pi Estimate: %f", PiEstimate); // yields ≈3.14 return 0; } 

(1) Your implementation actually seems to be doing modulo 20,001, resulting in an even bigger bias. For a RAND_MAX value of 32,767, the rand() % 20001 operation produces numbers in the interval [0 .. 12,766] twice as often as numbers in the interval [12,767 .. 20,000].
(2) When using a compiler with a larger RAND_MAX value (like x86_64 gcc (trunk) where it's 2,147,483,647), the bias is much less noticeable, but still there.
(3) This of course doesn't negate the effects of the relatively small random sample size, and the inherent bias of the PRNG itself.

Sign up to request clarification or add additional context in comments.

7 Comments

Given that RAND_MAX is allowed to be as low as 32767, picking from range [-32767 .. 32768] using % could turn out equivalent to [-32767 .. 0]. Which works, but not the way you expect! A perverse implementation (with RAND_MAX = 49151, for example) could really confound your results. (And this all assumes that rand() is uniformly distributed, which the standard does not require).
BTW, I upvoted for the clear explanation in paragraphs 1 & 2.
Agreed. I was not suggesting this as a solution, just as a quick way to demonstrate the bias without changing too much of the original code.
Good stuff. Before I get even semi-serious with a rand() application/project, I test my compiler's rand() for even distribution with a frequency diagram of some sort. On that note, I think it's possible to improve your lot by creating a do-it-yourself 64-bit rand() by calling rand() 4X and shifting your 16-bit result into each quarter of your 64-bit int.
Yuck, libc rand() can be very slow; for example GNU libc uses a global state across all threads, so each call has to take/release a lock, making it at least 10x slower than it would be otherwise even in a single-threaded program, and destroying memory-level parallelism at least on x86 where atomic RMWs are a full barrier. Just use a known-good PRNG like xoshiro256** (en.wikipedia.org/wiki/Xorshift#xoshiro256**) that produces 64-bit output.
(I just tested again on my Arch Linux desktop. rand() does check if the whole program is single-threaded, in which case it avoids locking. So it's only 20% slower on Skylake than the deprecated rand_r with a pointer to an unsigned int seed, which is not too slower but bad quality. I tested with a caller that just stores the rand() results to a volatile int, for 100M iterations compiled with gcc -O2. Takes 0.5s for 100M calls to rand() at 4.0GHz on Skylake. In a program that starts a do-nothing thread first, it takes 1.5 seconds. So 10x was an over-estimate.)
You can optimize performance by calculating x * x only once per x loop.
12

We can remove the biases introduced by the random function by simply iterating from -R to R in both x and y directions.

I've also changed sqrt(x * x + y * y) < R to x * x + y * y < R * R which allows us to avoid a floating point operations and a call to sqrt . To be even more accurate, we can use (R+1) * R to expand the circle by about half a unit to get better results on edge cases.

#include <stdio.h> #define R 10000 int main() { int d = 0; int n = (2*R+1) * (2*R+1); for (int x = -R; x <= R; x++) for (int y = -R; y <= R; y++) if (x * x + y * y < (R+1) * R) d++; printf("d = %d\n", d); printf("n = %d\n", n); double DNratio = (double) d/n; double PiEstimate = DNratio * 4; printf("Pi Estimate: %f", PiEstimate); return 0; } 

This example gives:

d = 314190797 n = 400040001 Pi Estimate: 3.141594 

The intuition behind growing the circle by half a unit is:

Each point really represents the 1x1 area centred on that point, so by only comparing against the centres of these 1x1 squares, we are counting a shape about half a unit smaller than the actual area of the circle. To compensate, we grow the circle by half a unit to match the boundaries of our squares. (And R(R+1) roughly equals (R+0.5)^2 if that wasn't obvious!)

8 Comments

The negative coordinates just duplicate the positive ones, and don't add any precision. Better to just measure the area of a single quadrant.
"expand the circle by about half a unit" is a good idea, yet deserves more explanation why. ( I see it as removing a bias of an integer random value versus a real one.)
If you are going to sample deterministically instead of randomly, you might as well just trace the circle with the Minsky circle-drawing algorithm and add up the strip areas: double t = 0; for (double x = 0, y = 1; 0 < y; x += y/R, y -= x/R) t += x*x; printf("%.9g\n", 4*t/R);.
A nice O(R*R) to O(R) improvement. (It may be useful to re-scale and avoid the /R in the loop.)
This (great) discussion is begging for proofs. We need 1.) a solidly biased rand(), 2.) a suite of competing solutions that allege to fix it, and 3.) an agreed method illustrating even distribution. Not sure where the approp. platform is.
It's not rand() itself that's biased, it's the method of using rand() to compute a uniformly distributed integer in the [min,max] interval. I.e. the OP's RandomNum function. At least it's being assumed that rand() is unbiased. See stackoverflow.com/questions/11758809/… If you believe rand() itself is biased, there are readily available solutions. The xoshiro family of algorithms seems popular: prng.di.unimi.it
After some hours, it turns out that correcting a biased standard lib rand() is not a trivial undertaking, and one of my more misguided notions, lol. If you're facing a biased rand() problem in your compiler/platform, it's much more advisable to simply find and use a competent rand() implementation on the web that has been coded by the geniuses that have gone before us. The Internet is a smorgasbord in this regard, luckily.
|
4

The Monte-Carlo method you implemented is neither precise nor accurate for multiple reasons:

  • the pseudo random number generator from the C library (the rand() function) is not perfect and in many implementations has biasses that will result in drift in such computations;
  • seeding it with time(NULL) will make you program use the same initial state during the same second, which might explain why you get the same result for different runs, giving you the false impression of preciseness. For the same seed, the rand() function will produce the same sequence so the computation will give the exact same result, precisely.
  • the rand() function returns integers in the range 0 to RAND_MAX, which may be as small as 32767. This does not allow for an accurate result, but it does not fully explain the drift you observe.
  • computing the coordinates with an integer modulo operation introduces another bias as values get truncated. Computing xI and yI as double values with (double)rand() * (max - min + 1) / RAND_MAX + min should lessen this bias.
  • computing the square root is not necessary and might introduce further inaccuracies. You should just compare the square of the hypothenuses: xI * xI + yI * yI <= 10000 * 10000
  • 10000 samples is not enough to avoid local biases of the rand() function. Try a larger number.

Here is a modified version:

#include <stdio.h> #include <stdlib.h> #include <time.h> int main(int argc, char *argv[]) { int num = 10000; int n2 = 10000; if (argc > 1) num = strtoul(argv[1], NULL, 0); if (argc > 2) n2 = strtoul(argv[2], NULL, 0); srand(time(NULL)); int d2 = 0; unsigned long long hyp_1 = (unsigned long long)RAND_MAX * RAND_MAX; for (int j = 0; j < n2; j++) { int d = 0; for (int i = 0; i < num; i++) { unsigned long long x = rand(); unsigned long long y = rand(); unsigned long long hyp = x * x + y * y; if (hyp <= hyp_1) { d++; } } double DNratio = (double)d / num; double PiEstimate = DNratio * 4; printf("Pi Estimate: %f (4 * %d / %d)\n", PiEstimate, d, num); d2 += d; } if (n2 > 1) { double DNratio = (double)d2 / (n2 * num); double PiEstimate = DNratio * 4; printf("Pi Estimate: %f (4 * %d / %d)\n", PiEstimate, d2, num * n2); } return 0; } 

1 Comment

This does indeed still work with a small RAND_MAX like 32767. I hacked it up to redefine RAND_MAX and do RAND_MAX & rand() to work with 16-bit random numbers, and it still produces answers near 3.14. godbolt.org/z/Yz7E7cxaf
0

Another answer here by chqrlie is probably optimal for the specific problem at hand. I believe it's also educational to tackle the core issue generally.

Specifically, there's a bias in the RandomNum(int min, int max) function. chqrlie completely side-steps this bias by using the full 0 to RAND_MAX instead of +-10000, so RandomNum function is not needed at all.

But alternatively, an answer to another question provides a simple unbiased algorithm for the offending function:

int RandomNum(int min, int max) { int n = max - min + 1; int remainder = RAND_MAX % n; int x; do { x = rand(); } while (x >= RAND_MAX - remainder); return min + x % n; } 

The principle is simply to reroll if the generated random number is in the "superfluous" range. If you have a PRNG which outputs a full 32-bit unsigned range, there is also a faster algorithm here: https://github.com/lemire/fastrange

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.