2

I am attempting to multithread some code with OMP. Currently my sequentially version using rand() to generate a set of random numbers with a consistent seed so that they return the same results when run each time. I want to parallelise my code but rand() is not thread safe. Can someone please show me how i would go about using a random number generator that works on threads so i can produce the same data set upon each test similar to that of using a seed with rand(). My code im parallelising is as follows:

 long linkCnt; long j; long t; srand(randInit); linkCnt=0; //Keep track of # of outgoing edges #pragma omp parallel for schedule(runtime) private(t) for(j = 0; j < G->N; j++) { if(i == j){ t = NO_CONN; } else { t = (rand() % ((MAX_EDGE_WEIGTH-1) * 2)+1); //50% of having no connection if(t > MAX_EDGE_WEIGTH){ //t = INF; //Like no connection t = NO_CONN; //Like no connection } else { linkCnt++; G->visited[j] = VISITED; //Do this to find isolated nods that have no incomming edge } } G->node[i][j] = t; } 
2
  • Couldn't you create a helper function that you use instead that prevents concurrent access to rand using locks or something? Commented Oct 19, 2018 at 0:35
  • 1
    Be careful with this. Threads are likely to call the rand function in different orders, so even if rand is thread safe, if the order of the random numbers matters then the only thing you can do is make sure all your operations are ordered. You can still do things in parallel, but you'll have to add a lot of synchronization and you'll end up spending a lot more time blocking. If this is about trying to verify your implementation, you may want to consider hashing based on position, or generating an array of G->N random numbers ahead of time then using arr[j] instead of rand(). Commented Oct 19, 2018 at 2:03

2 Answers 2

3

There seem to be a couple problems getting conflated here.

First, the non-thread-safe nature of rand() function means that calling rand() at the same time from different threads might yield different values than if it were called sequentially. It's probably easiest to explain this with a simple example, so let's look at a 32-bit version of PCG since it's nice and simple. It has a 32-bit state, and generates 32-bit numbers like this:

static uint32_t state = ...; static uint32_t generate(void) { uint32_t s = state; uint32_t v = ((s >> ((s >> 28) + 4)) ^ s) * (277803737U); v ^= v >> 22; state = state * 747796405U + 1729U; return v; } 

Now think about what happens if two threads call generate() at roughly the same time. Maybe they both get the same value for state, and so generate the same random number twice. Maybe one updates state before the other reads it, so they get different values.

We can eliminate the problem by protecting the generate() function with a mutex or, in the case of 32-bit PGC (and this is why I use it for reproducible numbers), using atomics. If we do that then we'll always get the same numbers in the same order.

Part two of the problem is what happens when the order of the callers gets mixed up in your code. Let's say you have two threads (called A and B), and they each have to run two iterations of your loop. Even if you're getting your random numbers from a thread-safe source, the order of the calls could be AABB, ABAB, ABBA, BBAA, BABA, or BAAB, each of which would cause your code to generate a different result.

There are several straightforward ways around this. First, you could use synchronization primitives to ensure that each iteration calls generate in the order you want. The easiest way would probably be to use a queue, but you'll waste a lot of time synchronizing and you'll lose some opportunities for parallelism (and you have to restructure your code significantly).

If you have a relatively small number of iterations, you might consider generating an array ahead of time. Think:

int i; int nums[LEN]; for (i = 0 ; i < LEN ; i++) nums[i] = generate(); #pragma omp parallel for ... for (i = 0 ; i < LEN ; i++) { do_stuff(nums[i]); } 

A better solution, though, might be to just ditch the idea of generating random numbers altogether and instead use hashing. https://burtleburtle.net/bob/hash/integer.html has some options. An example of that would be something like

for (int i = 0 ; i < LEN ; i++) { do_stuff(hash(i)); } 

Of course you can throw in some salt, maybe even use rand() to generate your salt.

Sign up to request clarification or add additional context in comments.

Comments

2

Here is a block based approach that divides the problem space in N/BLOCK_SIZE blocks and reseeds the RNG with your randInit + block number for each block. This gives a reproducible output no matter the number of threads you have. It also generates the same initial N numbers for sequence of N + x. This as long as you keep the same BLOCK_SIZE.

A good block-size is probably something like your typical N / (max_num_procs * 2). But there is room for experimentation.

#include <omp.h> #include <stdio.h> #include <stdlib.h> #define N_DEFAULT 48 //Default number of nodes #define BLOCK_SIZE 12 //BLOCK SIZE number of nodes per block. //Changes this changes reseed frequencey, //.. and hence the generated sequence #define randInit 42 //Had to be something. int main(int argc , char* argv[]) { int N=N_DEFAULT; if (argc >1) N=atoi(argv[1]); int rands[N];// keep our random numbers for sequential debug output int n=BLOCK_SIZE; int num_blocks=(N+BLOCK_SIZE-1)/ BLOCK_SIZE; // ceil(N/BLOCK_SIZE) int nt=omp_get_max_threads(); printf(" N: %d\n",N); printf(" Blocks: %d, (size: %d)\n",num_blocks,n); printf(" threads: %d\n",nt); //Parallel random generation #pragma omp parallel for schedule(runtime) for (int J=0;J<num_blocks;J++) { int block_seed=randInit+J; // unique block seed int start = J * n; int end= start+n > N?N:start+n; int tid = omp_get_thread_num(); // Just for debug printf("thread %d: works on block %d (%d - %d )\n",tid,J,start,end); for (int j=start; j < end;j++) { int t=rand_r(&block_seed); //rand_r provides thread safe (re-entrant rand) rands[j]=t; } } //Output for debug single thread for (int j=0; j < N;j++) { printf("%d : %d \n",j,rands[j]); } return 0; } 

Output with different N, and number of threads shown below.

N: 24 N: 27 Blocks: 3, (size: 8) Blocks: 4, (size: 8) threads: 4 threads: 1 -------------------------------------|------------------------------- thread 1: works on block 1 (8 - 16 ) thread 0: works on block 0 (0 - 8 ) thread 2: works on block 2 (16 - 24 ) thread 0: works on block 1 (8 - 16 ) thread 0: works on block 0 (0 - 8 ) thread 0: works on block 2 (16 - 24 ) thread 0: works on block 3 (24 - 27 ) -------------------------------------|------------------------------- 0 : 681191333 0 : 681191333 1 : 928546885 1 : 928546885 2 : 1457394273 2 : 1457394273 3 : 941445650 3 : 941445650 4 : 2129613237 4 : 2129613237 5 : 1661015563 5 : 1661015563 6 : 2071432601 6 : 2071432601 7 : 222443696 7 : 222443696 8 : 1156886562 8 : 1156886562 9 : 398918689 9 : 398918689 10 : 170756699 10 : 170756699 11 : 703115845 11 : 703115845 12 : 1424182583 12 : 1424182583 13 : 1516198481 13 : 1516198481 14 : 1740837599 14 : 1740837599 15 : 1148851528 15 : 1148851528 16 : 1633630368 16 : 1633630368 17 : 2015727614 17 : 2015727614 18 : 1031602773 18 : 1031602773 19 : 463737465 19 : 463737465 20 : 720848057 20 : 720848057 21 : 1369285272 21 : 1369285272 22 : 1411290150 22 : 1411290150 23 : 2074210785 23 : 2074210785 -------------------------------------|------------------------------- 24 : 2109326622 25 : 1486099418 26 : 1892448847 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.