Implementing a 64 bit PRNG library in C backed by xoroshiro128+

Question

Introduction and Credits

The inbuilt rand() function in C only offers a very limited range of possible outcomes, being limited to only 31 bits. I wanted something that was fast (enough) and could provide 64 random bits at a time. I fond an impressive, openly available PRNG for this purpose: xoroshiro128+. It seems to pass most well-known tests for randomness. I decided to modify it slightly and make a library.

Credit: David Blackman and Sebastiano Vigna for coming up with this implementation in 2016.

Their website: Link.

Here are my modifications to the code the original authors wrote.

`xoroshiro128plus.c`

// Original and permanent link: http://xoroshiro.di.unimi.it/xoroshiro128plus.c /* Written in 2016 by David Blackman and Sebastiano Vigna ([email protected]) To the extent possible under law, the author has dedicated all copyright and related and neighboring rights to this software to the public domain worldwide. This software is distributed without any warranty. See <http://creativecommons.org/publicdomain/zero/1.0/>. */ #include <stdint.h> #include <sys/time.h> /* This is the successor to xorshift128+. It is the fastest full-period generator passing BigCrush without systematic failures, but due to the relatively short period it is acceptable only for applications with a mild amount of parallelism; otherwise, use a xorshift1024* generator. Beside passing BigCrush, this generator passes the PractRand test suite up to (and included) 16TB, with the exception of binary rank tests, as the lowest bit of this generator is an LFSR of degree 128. The next bit can be described by an LFSR of degree 8256, but in the long run it will fail linearity tests, too. The other bits needs a much higher degree to be represented as LFSRs. We suggest to use a sign test to extract a random Boolean value, and right shifts to extract subsets of bits. Note that the generator uses a simulated rotate operation, which most C compilers will turn into a single instruction. In Java, you can use Long.rotateLeft(). In languages that do not make low-level rotation instructions accessible xorshift128+ could be faster. The state must be seeded so that it is not everywhere zero. If you have a 64-bit seed, we suggest to seed a splitmix64 generator and use its output to fill s. */ uint64_t s[2]; //////////////////////////////////////////////////////////////////////// /* Modifications by Subhomoy Haldar (github.com/Subh0m0y) start here. */ // As per the recommendation by the original authors - David Blackman and // Sebastiano Vigna, we shall use two iterations of a seeded splitmix64 // generator (written by Sebastiano Vigna) and use its output to seed this // program's seed vector. // Original and permanent link: http://xoroshiro.di.unimi.it/splitmix64.c static uint64_t splitmix64next(const uint64_t x) { uint64_t z = (x + 0x9e3779b97f4a7c15); z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9; z = (z ^ (z >> 27)) * 0x94d049bb133111eb; return z ^ (z >> 31); } const uint64_t SEED_SCRAMBLER = 0x37bc7dd1f3339a5fULL; static uint64_t time_based_x(void) { // Obtain the (relative, partial) time information // in microseconds struct timeval currentTime; gettimeofday(&currentTime, NULL); uint64_t value = currentTime.tv_usec; // Combine and generate the seed. uint64_t x = (value << 32) | value; x ^= SEED_SCRAMBLER; return x; } void _seed_with(const uint64_t x) { s[0] = splitmix64next(x); s[1] = splitmix64next(s[0]); } void _seed_auto(void) { _seed_with(time_based_x()); } /* Modifications by Subhomoy Haldar end here. */ //////////////////////////////////////////////// static inline uint64_t rotl(const uint64_t x, int k) { return (x << k) | (x >> (64 - k)); } uint64_t next(void) { const uint64_t s0 = s[0]; uint64_t s1 = s[1]; const uint64_t result = s0 + s1; s1 ^= s0; s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14); // a, b s[1] = rotl(s1, 36); // c return result; } /* This is the jump function for the generator. It is equivalent to 2^64 calls to next(); it can be used to generate 2^64 non-overlapping subsequences for parallel computations. */ void jump(void) { static const uint64_t JUMP[] = { 0xbeac0467eba5facb, 0xd86b048b86aa9922 }; uint64_t s0 = 0; uint64_t s1 = 0; for(int i = 0; i < sizeof JUMP / sizeof *JUMP; i++) for(int b = 0; b < 64; b++) { if (JUMP[i] & UINT64_C(1) << b) { s0 ^= s[0]; s1 ^= s[1]; } next(); } s[0] = s0; s[1] = s1; }

`xoroshiro128plus.h`

#ifndef XORO_SHIRO_128_PLUS #define XORO_SHIRO_128_PLUS #include <inttypes.h> /* Only expose the method that is relevant */ /* * Automatically initializes the seed vector for the xoroshiro128+ * PRNG, using a part of the current time (in microseconds) and * a seed scrambler. */ void _seed_auto(void); /* * Initializes the seed vector with a starting value. This is useful * for debugging when reproducible scenarios are desirable. */ void _seed_with(const uint64_t x); /* * Returns 64 randomly generated bits. */ uint64_t next(void); #endif

`randomlib.c`

#include <stdio.h> #include <stdlib.h> #include <stdbool.h> #include <stdint.h> #include <time.h> #include <math.h> #include "xoroshiro128plus.h" #define UINT32_MASK 0xFFFFFFFFULL // The seeding functions. /* * Initializes the PRNG. Uses the current time to seed * it. It's expected resolution is in microseconds. */ void init_rand(void) { _seed_auto(); } /* * Initializes the PRNG with a given seed. */ void seed(uint64_t x) { _seed_with(x); } /* * Returns a random integer in the range 0 (inclusive) * and limit (exclusive). The integers generated are uniformly * distributed. */ uint64_t next_uint64(const uint64_t limit) { return next() % limit; } /* * Returns a 32-bit unsigned integer. */ uint32_t next_uint32() { // Generate 2 ints at a time out of one call to next() // This makes use of both halves of the 64 bits generated static uint32_t next_uint32_temp; static bool has_next_uint32 = false; uint32_t val; if (has_next_uint32) { val = next_uint32_temp; } else { uint64_t full = next(); val = full >> 32; // The upper half next_uint32_temp = (uint32_t) (full & UINT32_MASK); // The lower half } // quick flip has_next_uint32 ^= true; return val; } /* * Returns a boolean value. The expected probability of * both values is 50%. */ bool next_bool() { // Sign test as per the recommendation // We check if the highest bit is on return (next() >> 63) & 1; } /* * Returns a uniformly distributed double between 0 * (inclusive) and 1 (exclusive). */ double next_double() { // return ((double) next()) / ((double) UINT64_MAX); // return (next() >> 11) * (1. / (UINT64_C(1) << 53)); return (next() >> 11) * 0x1.0p-53; } /* * Returns a normally distributed double between -1.0 * and +1.0 */ double next_gaussian() { static double next_gaussian; static bool has_next_gaussian = false; double val; if (has_next_gaussian) { val = next_gaussian; } else { double u, v, s; do { // Limit u and v to the range [-1, 1] u = next_double() * 2 - 1; v = next_double() * 2 - 1; s = u * u + v * v; } while(s > 1); double scale = sqrt(-2.0 * log(s) / s); next_gaussian = v * scale; val = u * scale; } // Quick flip has_next_gaussian ^= true; return val; }

My questions

I tested the output of all the functions I implemented by calling them several times and dumping the output to a CSV file. I then used Excel to plot Histograms and the plots seem reasonable. i.e. the functions are uniform and the Gaussian function is normal, as expected. You may perform yo0ur own tests or suggest a better, more efficient or automated approach.

The implementations for generating doubles have a characteristic dip in the frequency of the very last bucket (to about 50% of the rest), for all the methods. I do not think it would matter much in practice though. This is also something you may wish to comment upon.

The backing xoroshiro algorithm is pretty fast. I hope my implementation does not do injustice to it and introduce unnecessary bottle-necks.

As usual, you are free to comment and critique on every part of (my) code. You may suggest improvements to the code written by the original authors since it is in the Public Domain.

Did you build a wrapper or did you modify the original code? The whole point of a wrapper is to leave the original intact, or at least be independent of it's implementation. — Mast
– Mast ♦, Commented Jan 6, 2018 at 9:10
@Mast That's a valid point. I slightly modified the code as per the recommendation of the authors. — Hungry Blue Dev
– Hungry Blue Dev, Commented Jan 6, 2018 at 9:13
The magic numbers come from the original source: xoroshiro.di.unimi.it/splitmix64.c — Hungry Blue Dev
– Hungry Blue Dev, Commented Jan 6, 2018 at 9:15
I figured as much. But why are those numbers exactly that value? Are they dependent on each other? — Mast
– Mast ♦, Commented Jan 6, 2018 at 9:34

chux · Accepted Answer · 2018-01-06 18:56:36Z

Overall: good design, presentation and nicely formatted.

xoroshiro128plus.c

uint64_t s[2]; looks very bad to have at global scope. Certainly should have limited file scope with static uint64_t s[2]; Same for uint64_t SEED_SCRAMBLER
The state space is 128 bit, yet users with _seed_with(const uint64_t x) are limited to 2⁶⁴ combinations. I'd expect a _seed_with() that allows 2¹²⁸ or an auxiliary _seed_full_with(). 64-bit may seem sufficient today. Similar thinking years ago thought srand(unsigned int seed) was good enough too.
Code uses gettimeofday() which is not part of the Standard C library.
| favors 1 bits. Use ^. There is no difference with uint64_t value as a count of microseconds, yet it would be a weakness should value become derived otherwise.
```
// uint64_t x = (value << 32) | value; uint64_t x = (value << 32) ^ value; 
```
The net effect of static uint64_t time_based_x(void) is 1 of 1,000,000 different results. (Seems weak and is reminiscent of an online poker hack that use the second of the day as a game seed: Only 24*60*60 different decks). Recommended to use more of "now" to achieve more than a paltry 1,000,000 different results. Example:
```
uint64_t x = currentTime.tv_sec; x *= 1000000; // us per s x += currentTime.tv_usec; x ^= (x << 32) ^ SEED_SCRAMBLER; return x; 
```

xoroshiro128plus.h

const in void _seed_with(const uint64_t x); serves no purpose for the users. Recommend simplification void _seed_with(uint64_t x); for their sake
Function name next() is far too common and uninformative. Collisions are certain.
Although ponderous, consider function names like xoroshiro128plus_seed_auto(), xoroshiro128plus_next().

randomlib.h

The file is missing. I'd expect this code to include the companion to randomlib.c.
In the .h file is a great place to put the function description comments. There are not so much needed in the .c file

randomlib.c

Like #2 above, a function to seed more than 64-bits would be useful.
Bug. Re: "The integers generated are uniformly * distributed." with next_uint64(const uint64_t limit). Function only generates uniformly when limit is a power of 2. Function with that limitation is not that useful.
Inconsistent style: void init_rand(void) vs uint32_t next_uint32(). Recommend (void) for both.

Mask or cast is superfluous. Simply use one.

// next_uint32_temp = (uint32_t) (full & UINT32_MASK); next_uint32_temp = (uint32_t) full;

has_next_uint32 ^= true; is more logical as has_next_uint32 = !has_next_uint32;. Either way, has_next_uint32 = true; and has_next_uint32 = false in the if() ... else code would convey more meaning.
Bug. seed() does not reset has_next_uint32 in next_uint32(). So seed() does not "seed" the complete PRNG state. Same for has_next_gaussian in next_gaussian(). Consider making has_next_uint32, etc. static to the file, so seed() may reset them.
next_double() relies on DBL_EPSILON/2 == 0x1.0p-53. Either make code portable or add an assert/_Static_assertion to confirm assumptions.
[] usually means to include end points. So comment // Limit u and v to the range [-1, 1] would be better as // Limit u and v to the range (-1, 1).
Bug. No protection against s==0 in sqrt(-2.0 * log(s) / s).
Unclear comment in next_bool() with "Sign test as per the recommendation" as there is no sign in the code. return next() & 1; should be sufficient. Any bit should be as random as the others. Go for simple code.
LL unnecessary in UINT32_MASK 0xFFFFFFFFULL. The LL could only make the constant wider than it needs to be. I find using digits in one case and U in another adds clarity to the constant UINT32_MASK 0xFFFFFFFFu. Given #34, it is not needed anyways.
The function names lack distinctiveness and meaning (e.g. How is next_uint32 something to do with "random"?). Recommend similar change here as in #13. Example randomlib_gaussian().

Candidate alternative next_double(). The commented out ((double) next()) / ((double) UINT64_MAX); is on the right track, yet it should divide by UINT64_MAX + 1. Code should also be careful how to convert _MAX to floating point. With the below code, small values retain more precision than OP's code.

double next_double(void) { #define DBL_UINT64_MAX_P1 ((UINT64_MAX/2 + 1)*2.0) int save_round = fegetround(); fesetround(FE_DOWNWARD); double d = next() / DBL_UINT64_MAX_P1; fesetround(save_round); return d; }

[Edit] On review, I see the above needs the complication of adjusting the rounding mode - sigh - not as clean an alternative as hoped.

See this good answer for #32 alternative: generate a uniform mapping It is that answer's step 3 is so important. Also this — chux
– chux, Commented Jan 6, 2018 at 19:02
A very thorough answer as usual. Well done. I'd like to ask for a suggestion for an alternative for gettimeofday() which you mentioned is not part of the Standard C Library in #3. — Hungry Blue Dev
– Hungry Blue Dev, Commented Jan 6, 2018 at 21:27
@Astrobleme I see no great alternative for #3. One idea is to use the result of time() and clock() like uint64_t c = clock(), t = time(); return (c << 32) ^ t ^ SEED_SCRAMBLER; This would need more study. In the end, this part of code should likely go of to platform dependent code. — chux
– chux, Commented Jan 6, 2018 at 21:41
I have a question about clock() and the corresponding CLOCKS_PER_SEC macro. How does this stay constant for most systems which implement turbo boost and can reach high(er) clock speeds (say 3.0 to 3.7 GHz)? Same when the system enters power saving mode and the clock speed is throttled to about half or even less. Should I avoid using CLOCKS_PER_SEC? Or is there a proper way? Or did I interpret it wrong? — Hungry Blue Dev
– Hungry Blue Dev, Commented Jan 7, 2018 at 7:43
@Astrobleme "How does this stay constant for most system" - it does not. Original gettimeofday() does not have consistency as the accuracy of its currentTime.tv_usec varies from system to system, (it does uniformly report us though - but that may move in 1000 us steps for example) , so using clock() would be a portable substitute. The goals of time_based_x() is to offer some "true" random seed for _seed_auto() and using "time" can only provide so many "random" bits. If code uses non-Standard C code, research /dev/random. — chux
– chux, Commented Jan 7, 2018 at 12:34

Albert Chan · Accepted Answer · 2018-05-20 17:13:03Z

your _seed_with() does not fill state using splitmix as PRNG
Instead, it fill state using nested hash functions

s[1] = splitmix64next(s[0] = splitmix64next(x));

The equidistribution and nice 2^64 period is now lost

Using nested filling method for bigger state, this might happen (period = 2):

s[] = {y, x, y, x, y, x, ...}; // if splitmix(splitmix(x)) = x

Does s[] look like initialized randomly ?

Using splitmix PRNG to fill state, all values are guaranteed different.

Also, SEED_SCRAMBLER is unnecessary, that should be the job of splitmix

Stack Exchange Network

Implementing a 64 bit PRNG library in C backed by xoroshiro128+

Introduction and Credits

`xoroshiro128plus.c`

`xoroshiro128plus.h`

`randomlib.c`

My questions

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Implementing a 64 bit PRNG library in C backed by xoroshiro128+

Introduction and Credits

xoroshiro128plus.c

xoroshiro128plus.h

randomlib.c

My questions

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions

`xoroshiro128plus.c`

`xoroshiro128plus.h`

`randomlib.c`