I am trying to parallelize a translated program which is in C/C++. The program was originally in Verilog hardware description language. The program has been translated into C/C++ by a translator program. The fact that it is circuit level C/C++ program should not make a difference. I am essentially trying to follow the approach in
https://stackoverflow.com/users/2979872/user2979872
But
I am getting a SEGMENTATION FAULT when i try to use openmp to parallize. Here is the code. When i make "top", which is a pointer to an object private by writing #pragma omp parallel num_threads(2) private(top), i get segmentation fault.
//////////////////////////////////////////////////////////////////////////////////////////
int main(int argc, char **argv, char **env) { Verilated::commandArgs(argc, argv); Vaes_cipher_top* top = new Vaes_cipher_top; // this is the aes object that will do the enc unsigned int i = 0; unsigned int set_done; unsigned int ld_set = 0; top->rst = 1; // assert reset #pragma omp parallel num_threads(2) private(top) while (i < 2) { if (main_time > 10) { top->rst = 0; // Deassert reset } if ((main_time % 10) == 1) { top->clk = 1; // Toggle clock (posedge) } if ((main_time % 10) == 6) { top->clk = 0; //setting DUT values if(ld_set!=1 && main_time > 10) { top -> ld = 1; top -> key = {0x00000000,0x00000000,0x00000000,0x00000000}; top -> text_in = {0x00000000,0x00000000,0x00000000,0x00000000}; ld_set++; } else if(ld_set == 1 && main_time > 10) { top -> ld = 0; set_done = 0; } } //(main_time % 10) == 6) top->eval(); // Evaluate model if(top->done && !set_done) { print(top->key); print(top->text_in); print(top->text_out); ld_set = 0; //reset i++; set_done = 1; } //if(top->done) main_time++; // Time passes... } //end of while printf("\n Test Done\n"); top->final(); // Done simulating delete top; return 0; } //end of main Trying to move forward as suggested by Hristo. No more segmentation fault but incorrect result due to race conditions.
int main(int argc, char **argv, char **env) { Verilated::commandArgs(argc, argv); Vaes_cipher_top* top; // this is the aes object that will do the enc unsigned int i = 0; unsigned int set_done; unsigned int ld_set = 0; //top->rst = 1; // assert reset unsigned int iter_count = 1; #pragma omp parallel num_threads(2) firstprivate(iter_count,ld_set,set_done,i) while (i < 2) { if(iter_count) { top = new Vaes_cipher_top; iter_count = 0; } if(main_time == 0) top-> rst = 1; //assert reset if (main_time > 10) { top->rst = 0; // Deassert reset } if ((main_time % 10) == 1) { top->clk = 1; // Toggle clock (posedge) } if ((main_time % 10) == 6) { top->clk = 0; //setting DUT values if(ld_set!=1 && main_time > 10) { top -> ld = 1; top -> key = {0x00000000,0x00000000,0x00000000,0x00000000}; top -> text_in = {0x00000000,0x00000000,0x00000000,0x00000000}; ld_set++; } else if(ld_set == 1 && main_time > 10) { top -> ld = 0; set_done = 0; } } //(main_time % 10) == 6) top->eval(); // Evaluate model if(top->done && !set_done) { print(top->key); print(top->text_in); print(top->text_out); ld_set = 0; //reset i++; set_done = 1; iter_count = 1; } //if(top->done) main_time++; // Time passes... } //end of while printf("\n Test Done\n"); top->final(); // Done simulating delete top; return 0; } //end of main ////////////////////////////////////////////////////////////////////////////////////
updated as suggested by Hristo to move the declaration Vaes_cipher_top *top inside the while loop
int main(int argc, char **argv, char **env) { Verilated::commandArgs(argc, argv); unsigned int i = 0; unsigned int set_done; unsigned int ld_set = 0; //top->rst = 1; // assert reset unsigned int iter_count = 1; #pragma omp parallel num_threads(2) firstprivate(iter_count,ld_set,set_done,i) while (i < 2) { if(iter_count) { Vaes_cipher_top* top; // this is the aes object that will do the enc top = new Vaes_cipher_top; iter_count = 0; } if(main_time == 0) top-> rst = 1; //assert reset if (main_time > 10) { top->rst = 0; // Deassert reset } if ((main_time % 10) == 1) { top->clk = 1; // Toggle clock (posedge) } if ((main_time % 10) == 6) { top->clk = 0; //setting DUT values if(ld_set!=1 && main_time > 10) { top -> ld = 1; top -> key = {0x00000000,0x00000000,0x00000000,0x00000000}; top -> text_in = {0x00000000,0x00000000,0x00000000,0x00000000}; ld_set++; } else if(ld_set == 1 && main_time > 10) { top -> ld = 0; set_done = 0; } } //(main_time % 10) == 6) top->eval(); // Evaluate model if(top->done && !set_done) { print(top->key); print(top->text_in); print(top->text_out); ld_set = 0; //reset i++; set_done = 1; iter_count = 1; } //if(top->done) main_time++; // Time passes... } //end of while printf("\n Test Done\n"); top->final(); // Done simulating delete top; return 0; } //end of main //////////////////////////////////////////////////////////////////////////////////////////// Here is the output. All the errors are the same. I am putting a few of them
./sim_main.cpp:76: error: ‘top’ was not declared in this scope ( on line where top->rst=1) ../sim_main.cpp:80: error: ‘top’ was not declared in this scope (on line where top->rst=0) ../sim_main.cpp:84: error: ‘top’ was not declared in this scope (on line where top->clk =1) ../sim_main.cpp:89: error: ‘top’ was not declared in this scope (on line where top->clk=0)
If you remove the if surrounding the Vaes_cipher_top declartion, it becomes infinite loop!!! ///////////////////////////////////////////////////////////////////////////////////////////
Simulation hangs and output appears at different times on every run. I am using 2 threads i.e, num_threads(2)
(1) This is the run where simulation terminates key=67fd3c2821b9201521d6a87f205e3039 text_in=67fd3c2821b9201521d6a87f205e3039 Time=251,text_out=71a354729996bac975784dcdb50260d9, done= 1 on 0 of 2 i= 1 key=1a857b7f39a0290d20bbf2466b5b14e8 text_in=1a857b7f39a0290d20bbf2466b5b14e8 Time=321,text_out=da36095f53fd86a57f9d147e8e05603, done= 1 on 1 of 2 i= 1 key=67fd3c2821b9201521d6a87f205e3039 text_in=67fd3c2821b9201521d6a87f205e3039 Time=401,text_out=71a354729996bac975784dcdb50260d9, done= 1 on 0 of 2 i= 2 key=1a857b7f39a0290d20bbf2466b5b14e8 text_in=1a857b7f39a0290d20bbf2466b5b14e8 Time=601,text_out=da36095f53fd86a57f9d147e8e05603, done= 1 on 1 of 2 i= 2 key=67fd3c2821b9201521d6a87f205e3039 text_in=67fd3c2821b9201521d6a87f205e3039 Time=641,text_out=71a354729996bac975784dcdb50260d9, done= 1 on 0 of 2 i= 3 key=1a857b7f39a0290d20bbf2466b5b14e8 text_in=1a857b7f39a0290d20bbf2466b5b14e8 Time=841,text_out=da36095f53fd86a57f9d147e8e05603, done= 1 on 1 of 2 i= 3 key=67fd3c2821b9201521d6a87f205e3039 text_in=67fd3c2821b9201521d6a87f205e3039 Time=911,text_out=71a354729996bac975784dcdb50260d9, done= 1 on 0 of 2 i= 4 key=1a857b7f39a0290d20bbf2466b5b14e8 text_in=1a857b7f39a0290d20bbf2466b5b14e8 Time=991,text_out=da36095f53fd86a57f9d147e8e05603, done= 1 on 1 of 2 i= 4 Test Done (2) This is the RUN where simulation DOES NOT terminate and i had to press ctrl+c to abort the simulation key=75f1bcf47451ab0f33b58a5e1adfdd6 text_in=75f1bcf47451ab0f33b58a5e1adfdd6 Time=411,text_out=9049c33819d61de5c09aa388479ef10, done= 1 on 0 of 2 i= 1 key=75f1bcf47451ab0f33b58a5e1adfdd6 text_in=75f1bcf47451ab0f33b58a5e1adfdd6 Time=696,text_out=9049c33819d61de5c09aa388479ef10, done= 1 on 0 of 2 i= 2 key=75f1bcf47451ab0f33b58a5e1adfdd6 text_in=75f1bcf47451ab0f33b58a5e1adfdd6 Time=931,text_out=9049c33819d61de5c09aa388479ef10, done= 1 on 0 of 2 i= 3 key=75f1bcf47451ab0f33b58a5e1adfdd6 text_in=75f1bcf47451ab0f33b58a5e1adfdd6 Time=1151,text_out=9049c33819d61de5c09aa388479ef10, done= 1 on 0 of 2 i= 4 ^C (Had to press Ctrl+c to abort the simulation. Only 1 core is being used instead
of 2, Why is this happening and how to prevent that from happening? Why is output
of two threads NOT appearing at the same time? Can this be done?
Please click the button add / show 1 more comment at the bottom of this webpage to see new comments
Thanks
/////////////////////////////////////////////////////// Here is the final working code that i want to share with everybody ///////////////////////////////////////////////////////////////////
#include <omp.h> #include "Vaes_cipher_top.h" #include "verilated.h" #include "verilated_vcd_c.h" #include <stdio.h> #include <stdlib.h> #include <time.h> //#pragma omp threadprivate(top) vluint64_t main_time = 0; // Current simulation time // This is a 64-bit integer to reduce wrap over issues and // allow modulus. You can also use a double, if you wish. double sc_time_stamp () { // Called by $time in Verilog return main_time; // converts to double, to match // what SystemC does } int main(int argc, char **argv, char **env) { Verilated::commandArgs(argc, argv); srand(time(NULL)); unsigned int set_done = 0; unsigned int i = 0; unsigned int ld_set = 0; #ifdef OMP #pragma omp parallel default(none) firstprivate(i,set_done,ld_set,main_time) { // unsigned int set_done = 0; // unsigned int i = 0; // unsigned int ld_set = 0; Vaes_cipher_top* top = new Vaes_cipher_top; // this is the aes object that will do the enc top->rst = 1; // assert reset #endif while (i < 65000) // #pragma omp parallel for ordered schedule(static) // for(i=0; (i<65000);i++) { if (main_time > 10) { top->rst = 0; // Deassert reset } if ((main_time % 10) == 1) { top->clk = 1; // Toggle clock (posedge) } if ((main_time % 10) == 6) { top->clk = 0; //setting DUT values if(ld_set!=1 && main_time > 10) { top -> ld = 1; //unsigned int rand_state = time(NULL) + 1337*omp_get_thread_num(); //unsigned int rnd[4]; //rnd[0] = rand_r(&rand_state); //rnd[1] = rand_r(&rand_state); //rnd[2] = rand_r(&rand_state); //rnd[3] = rand_r(&rand_state); top -> key = {rand(),rand(),rand(),rand()}; // {0x00000000,0x000000000,0x00000000,0x00000000}; top -> text_in = {rand(),rand(),rand(),rand()}; //{0x00000000,0x00000000,0x00000000,0x00000000}; //top -> key = {0x00000000,0x00000000,0x00000000,0x00000000}; //top -> text_in = {0x00000000,0x00000000,0x00000000,0x00000000}; ld_set++; } else if(ld_set == 1 && main_time > 10) { top -> ld = 0; set_done = 0; } } //(main_time % 10) == 6) top->eval(); // Evaluate model if(top->done == 1 && set_done == 0) { #ifdef OMP printf("Time=%2d, key=%2x%2x%2x%2x,text_in=%2x%2x%2x%2x,text_out=%2x%2x%2x%2x on %2d of %2d\n", \ main_time,top->key[3],top->key[2],top->key[1],top->key[0], \ top->text_in[3],top->text_in[2],top->text_in[1],top->text_in[0], \ top->text_out[3],top->text_out[2],top->text_out[1],top->text_out[0],top->done, \ omp_get_thread_num(),omp_get_num_threads() ); #else printf("Time=%2d, key=%2x%2x%2x%2x,text_in=%2x%2x%2x%2x,text_out=%2x%2x%2x%2x\n" , \ main_time,top->key[3],top->key[2],top->key[1],top->key[0], \ top->text_in[3],top->text_in[2],top->text_in[1],top->text_in[0], \ top->text_out[3],top->text_out[2],top->text_out[1],top->text_out[0],top->done ); #endif ld_set = 0; //reset the value i++; // printf("i=%2d\n",i); set_done = 1; } //if(top->done) //#pragma omp barrier main_time++; } //end of while top->final(); // Done simulating delete top; } //pragma omp printf("\n Test Done\n"); return 0; } //end of main
private(top)doesn't make the object pointed bytopprivate but rather the pointer itself. Each thread ends up with an uninitialised private pointer and any attempt to dereference such a pointer with->ends up in a segmentation fault. How about you move theVaes_cipher_top* top = new Vaes_cipher_top;statement inside the parallel loop? Some variables look like they too need theprivatetreatment. Also, each thread executes the samewhileloop and the work is not distributed among the threads. Are you testing ifVaes_cipher_topis thread-safe?newstatement inside the parallel region but left the definition oftopoutside and it is still shared.