Introduction and Credits
The inbuilt rand() function in C only offers a very limited range of possible outcomes, being limited to only 31 bits. I wanted something that was fast (enough) and could provide 64 random bits at a time. I fond an impressive, openly available PRNG for this purpose: xoroshiro128+. It seems to pass most well-known tests for randomness. I decided to modify it slightly and make a library.
Credit: David Blackman and Sebastiano Vigna for coming up with this implementation in 2016.
Their website: Link.
Here are my modifications to the code the original authors wrote.
xoroshiro128plus.c
// Original and permanent link: http://xoroshiro.di.unimi.it/xoroshiro128plus.c /* Written in 2016 by David Blackman and Sebastiano Vigna ([email protected]) To the extent possible under law, the author has dedicated all copyright and related and neighboring rights to this software to the public domain worldwide. This software is distributed without any warranty. See <http://creativecommons.org/publicdomain/zero/1.0/>. */ #include <stdint.h> #include <sys/time.h> /* This is the successor to xorshift128+. It is the fastest full-period generator passing BigCrush without systematic failures, but due to the relatively short period it is acceptable only for applications with a mild amount of parallelism; otherwise, use a xorshift1024* generator. Beside passing BigCrush, this generator passes the PractRand test suite up to (and included) 16TB, with the exception of binary rank tests, as the lowest bit of this generator is an LFSR of degree 128. The next bit can be described by an LFSR of degree 8256, but in the long run it will fail linearity tests, too. The other bits needs a much higher degree to be represented as LFSRs. We suggest to use a sign test to extract a random Boolean value, and right shifts to extract subsets of bits. Note that the generator uses a simulated rotate operation, which most C compilers will turn into a single instruction. In Java, you can use Long.rotateLeft(). In languages that do not make low-level rotation instructions accessible xorshift128+ could be faster. The state must be seeded so that it is not everywhere zero. If you have a 64-bit seed, we suggest to seed a splitmix64 generator and use its output to fill s. */ uint64_t s[2]; //////////////////////////////////////////////////////////////////////// /* Modifications by Subhomoy Haldar (github.com/Subh0m0y) start here. */ // As per the recommendation by the original authors - David Blackman and // Sebastiano Vigna, we shall use two iterations of a seeded splitmix64 // generator (written by Sebastiano Vigna) and use its output to seed this // program's seed vector. // Original and permanent link: http://xoroshiro.di.unimi.it/splitmix64.c static uint64_t splitmix64next(const uint64_t x) { uint64_t z = (x + 0x9e3779b97f4a7c15); z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9; z = (z ^ (z >> 27)) * 0x94d049bb133111eb; return z ^ (z >> 31); } const uint64_t SEED_SCRAMBLER = 0x37bc7dd1f3339a5fULL; static uint64_t time_based_x(void) { // Obtain the (relative, partial) time information // in microseconds struct timeval currentTime; gettimeofday(¤tTime, NULL); uint64_t value = currentTime.tv_usec; // Combine and generate the seed. uint64_t x = (value << 32) | value; x ^= SEED_SCRAMBLER; return x; } void _seed_with(const uint64_t x) { s[0] = splitmix64next(x); s[1] = splitmix64next(s[0]); } void _seed_auto(void) { _seed_with(time_based_x()); } /* Modifications by Subhomoy Haldar end here. */ //////////////////////////////////////////////// static inline uint64_t rotl(const uint64_t x, int k) { return (x << k) | (x >> (64 - k)); } uint64_t next(void) { const uint64_t s0 = s[0]; uint64_t s1 = s[1]; const uint64_t result = s0 + s1; s1 ^= s0; s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14); // a, b s[1] = rotl(s1, 36); // c return result; } /* This is the jump function for the generator. It is equivalent to 2^64 calls to next(); it can be used to generate 2^64 non-overlapping subsequences for parallel computations. */ void jump(void) { static const uint64_t JUMP[] = { 0xbeac0467eba5facb, 0xd86b048b86aa9922 }; uint64_t s0 = 0; uint64_t s1 = 0; for(int i = 0; i < sizeof JUMP / sizeof *JUMP; i++) for(int b = 0; b < 64; b++) { if (JUMP[i] & UINT64_C(1) << b) { s0 ^= s[0]; s1 ^= s[1]; } next(); } s[0] = s0; s[1] = s1; } xoroshiro128plus.h
#ifndef XORO_SHIRO_128_PLUS #define XORO_SHIRO_128_PLUS #include <inttypes.h> /* Only expose the method that is relevant */ /* * Automatically initializes the seed vector for the xoroshiro128+ * PRNG, using a part of the current time (in microseconds) and * a seed scrambler. */ void _seed_auto(void); /* * Initializes the seed vector with a starting value. This is useful * for debugging when reproducible scenarios are desirable. */ void _seed_with(const uint64_t x); /* * Returns 64 randomly generated bits. */ uint64_t next(void); #endif randomlib.c
#include <stdio.h> #include <stdlib.h> #include <stdbool.h> #include <stdint.h> #include <time.h> #include <math.h> #include "xoroshiro128plus.h" #define UINT32_MASK 0xFFFFFFFFULL // The seeding functions. /* * Initializes the PRNG. Uses the current time to seed * it. It's expected resolution is in microseconds. */ void init_rand(void) { _seed_auto(); } /* * Initializes the PRNG with a given seed. */ void seed(uint64_t x) { _seed_with(x); } /* * Returns a random integer in the range 0 (inclusive) * and limit (exclusive). The integers generated are uniformly * distributed. */ uint64_t next_uint64(const uint64_t limit) { return next() % limit; } /* * Returns a 32-bit unsigned integer. */ uint32_t next_uint32() { // Generate 2 ints at a time out of one call to next() // This makes use of both halves of the 64 bits generated static uint32_t next_uint32_temp; static bool has_next_uint32 = false; uint32_t val; if (has_next_uint32) { val = next_uint32_temp; } else { uint64_t full = next(); val = full >> 32; // The upper half next_uint32_temp = (uint32_t) (full & UINT32_MASK); // The lower half } // quick flip has_next_uint32 ^= true; return val; } /* * Returns a boolean value. The expected probability of * both values is 50%. */ bool next_bool() { // Sign test as per the recommendation // We check if the highest bit is on return (next() >> 63) & 1; } /* * Returns a uniformly distributed double between 0 * (inclusive) and 1 (exclusive). */ double next_double() { // return ((double) next()) / ((double) UINT64_MAX); // return (next() >> 11) * (1. / (UINT64_C(1) << 53)); return (next() >> 11) * 0x1.0p-53; } /* * Returns a normally distributed double between -1.0 * and +1.0 */ double next_gaussian() { static double next_gaussian; static bool has_next_gaussian = false; double val; if (has_next_gaussian) { val = next_gaussian; } else { double u, v, s; do { // Limit u and v to the range [-1, 1] u = next_double() * 2 - 1; v = next_double() * 2 - 1; s = u * u + v * v; } while(s > 1); double scale = sqrt(-2.0 * log(s) / s); next_gaussian = v * scale; val = u * scale; } // Quick flip has_next_gaussian ^= true; return val; } My questions
I tested the output of all the functions I implemented by calling them several times and dumping the output to a CSV file. I then used Excel to plot Histograms and the plots seem reasonable. i.e. the functions are uniform and the Gaussian function is normal, as expected. You may perform yo0ur own tests or suggest a better, more efficient or automated approach.
The implementations for generating doubles have a characteristic dip in the frequency of the very last bucket (to about 50% of the rest), for all the methods. I do not think it would matter much in practice though. This is also something you may wish to comment upon.
The backing xoroshiro algorithm is pretty fast. I hope my implementation does not do injustice to it and introduce unnecessary bottle-necks.
As usual, you are free to comment and critique on every part of (my) code. You may suggest improvements to the code written by the original authors since it is in the Public Domain.