Skip to main content
replaced http://stackoverflow.com/ with https://stackoverflow.com/
Source Link

Without addressing performance concerns, some trivial observations:

  • #include <omp.h> is unnecessary. (You use OpenMP, but don't call any OpenMP functions.)

  • The return type of main()return type of main() should be int, not void.

  • The code also compiles with clang (LLVM), if you omit the -masm=intel option.

  • zeromat() could simply be memset(C, 0, n * sizeof(double)).

  • When compiling with -Wall, the code in dgemm_2x8_sse() to zero some registers causes spurious warnings:

    matmul.c:56:21: warning: variable 'r8' is uninitialized when used here [-Wuninitialized] r8 = _mm_xor_pd(r8,r8); // ab ^~ matmul.c:52:5: note: variable 'r8' is declared here register __m128d xmm1, xmm4, // ^ 

    I recommend disabling the warnings with a pair of pragmas:

     #pragma GCC diagnostic ignored "-Wuninitialized" r8 = _mm_xor_pd(r8,r8); // ab r9 = _mm_xor_pd(r9,r9); r10 = _mm_xor_pd(r10,r10); r11 = _mm_xor_pd(r11,r11); r12 = _mm_xor_pd(r12,r12); // ab + 8 r13 = _mm_xor_pd(r13,r13); r14 = _mm_xor_pd(r14,r14); r15 = _mm_xor_pd(r15,r15); #pragma GCC diagnostic warning "-Wuninitialized" 

    You should also discard the confusing and useless comment that precedes that code:

     // 10 registers declared here 
  • "Gflops/s" is redundant and incorrect terminology (unless you are talking about acceleration, not speed!)

Without addressing performance concerns, some trivial observations:

  • #include <omp.h> is unnecessary. (You use OpenMP, but don't call any OpenMP functions.)

  • The return type of main() should be int, not void.

  • The code also compiles with clang (LLVM), if you omit the -masm=intel option.

  • zeromat() could simply be memset(C, 0, n * sizeof(double)).

  • When compiling with -Wall, the code in dgemm_2x8_sse() to zero some registers causes spurious warnings:

    matmul.c:56:21: warning: variable 'r8' is uninitialized when used here [-Wuninitialized] r8 = _mm_xor_pd(r8,r8); // ab ^~ matmul.c:52:5: note: variable 'r8' is declared here register __m128d xmm1, xmm4, // ^ 

    I recommend disabling the warnings with a pair of pragmas:

     #pragma GCC diagnostic ignored "-Wuninitialized" r8 = _mm_xor_pd(r8,r8); // ab r9 = _mm_xor_pd(r9,r9); r10 = _mm_xor_pd(r10,r10); r11 = _mm_xor_pd(r11,r11); r12 = _mm_xor_pd(r12,r12); // ab + 8 r13 = _mm_xor_pd(r13,r13); r14 = _mm_xor_pd(r14,r14); r15 = _mm_xor_pd(r15,r15); #pragma GCC diagnostic warning "-Wuninitialized" 

    You should also discard the confusing and useless comment that precedes that code:

     // 10 registers declared here 
  • "Gflops/s" is redundant and incorrect terminology (unless you are talking about acceleration, not speed!)

Without addressing performance concerns, some trivial observations:

  • #include <omp.h> is unnecessary. (You use OpenMP, but don't call any OpenMP functions.)

  • The return type of main() should be int, not void.

  • The code also compiles with clang (LLVM), if you omit the -masm=intel option.

  • zeromat() could simply be memset(C, 0, n * sizeof(double)).

  • When compiling with -Wall, the code in dgemm_2x8_sse() to zero some registers causes spurious warnings:

    matmul.c:56:21: warning: variable 'r8' is uninitialized when used here [-Wuninitialized] r8 = _mm_xor_pd(r8,r8); // ab ^~ matmul.c:52:5: note: variable 'r8' is declared here register __m128d xmm1, xmm4, // ^ 

    I recommend disabling the warnings with a pair of pragmas:

     #pragma GCC diagnostic ignored "-Wuninitialized" r8 = _mm_xor_pd(r8,r8); // ab r9 = _mm_xor_pd(r9,r9); r10 = _mm_xor_pd(r10,r10); r11 = _mm_xor_pd(r11,r11); r12 = _mm_xor_pd(r12,r12); // ab + 8 r13 = _mm_xor_pd(r13,r13); r14 = _mm_xor_pd(r14,r14); r15 = _mm_xor_pd(r15,r15); #pragma GCC diagnostic warning "-Wuninitialized" 

    You should also discard the confusing and useless comment that precedes that code:

     // 10 registers declared here 
  • "Gflops/s" is redundant and incorrect terminology (unless you are talking about acceleration, not speed!)

terminology
Source Link
200_success
  • 145.7k
  • 22
  • 191
  • 481

SomeWithout addressing performance concerns, some trivial observations:

  • #include <omp.h> is unnecessary. (You use OpenMP, but don't call any OpenMP functions.)

  • The return type of main() should be int, not void.

  • The code also compiles with clang (LLVM), if you omit the -masm=intel option.

  • zeromat() could simply be memset(C, 0, n * sizeof(double)).

  • When compiling with -Wall, the code in dgemm_2x8_sse() to zero some registers causes spurious warnings:

    matmul.c:56:21: warning: variable 'r8' is uninitialized when used here [-Wuninitialized] r8 = _mm_xor_pd(r8,r8); // ab ^~ matmul.c:52:5: note: variable 'r8' is declared here register __m128d xmm1, xmm4, // ^ 

    I recommend disabling the warnings with a pair of pragmas:

     #pragma GCC diagnostic ignored "-Wuninitialized" r8 = _mm_xor_pd(r8,r8); // ab r9 = _mm_xor_pd(r9,r9); r10 = _mm_xor_pd(r10,r10); r11 = _mm_xor_pd(r11,r11); r12 = _mm_xor_pd(r12,r12); // ab + 8 r13 = _mm_xor_pd(r13,r13); r14 = _mm_xor_pd(r14,r14); r15 = _mm_xor_pd(r15,r15); #pragma GCC diagnostic warning "-Wuninitialized" 

    You should also discard the confusing and useless comment that precedes that code:

     // 10 registers declared here 
  • "Gflops/s" is redundant and incorrect terminology (unless you are talking about acceleration, not speed!)

Some trivial observations:

  • #include <omp.h> is unnecessary. (You use OpenMP, but don't call any OpenMP functions.)

  • The return type of main() should be int, not void.

  • The code also compiles with clang (LLVM), if you omit the -masm=intel option.

  • zeromat() could simply be memset(C, 0, n * sizeof(double)).

  • When compiling with -Wall, the code in dgemm_2x8_sse() to zero some registers causes spurious warnings:

    matmul.c:56:21: warning: variable 'r8' is uninitialized when used here [-Wuninitialized] r8 = _mm_xor_pd(r8,r8); // ab ^~ matmul.c:52:5: note: variable 'r8' is declared here register __m128d xmm1, xmm4, // ^ 

    I recommend disabling the warnings with a pair of pragmas:

     #pragma GCC diagnostic ignored "-Wuninitialized" r8 = _mm_xor_pd(r8,r8); // ab r9 = _mm_xor_pd(r9,r9); r10 = _mm_xor_pd(r10,r10); r11 = _mm_xor_pd(r11,r11); r12 = _mm_xor_pd(r12,r12); // ab + 8 r13 = _mm_xor_pd(r13,r13); r14 = _mm_xor_pd(r14,r14); r15 = _mm_xor_pd(r15,r15); #pragma GCC diagnostic warning "-Wuninitialized" 

    You should also discard the confusing and useless comment that precedes that code:

     // 10 registers declared here 

Without addressing performance concerns, some trivial observations:

  • #include <omp.h> is unnecessary. (You use OpenMP, but don't call any OpenMP functions.)

  • The return type of main() should be int, not void.

  • The code also compiles with clang (LLVM), if you omit the -masm=intel option.

  • zeromat() could simply be memset(C, 0, n * sizeof(double)).

  • When compiling with -Wall, the code in dgemm_2x8_sse() to zero some registers causes spurious warnings:

    matmul.c:56:21: warning: variable 'r8' is uninitialized when used here [-Wuninitialized] r8 = _mm_xor_pd(r8,r8); // ab ^~ matmul.c:52:5: note: variable 'r8' is declared here register __m128d xmm1, xmm4, // ^ 

    I recommend disabling the warnings with a pair of pragmas:

     #pragma GCC diagnostic ignored "-Wuninitialized" r8 = _mm_xor_pd(r8,r8); // ab r9 = _mm_xor_pd(r9,r9); r10 = _mm_xor_pd(r10,r10); r11 = _mm_xor_pd(r11,r11); r12 = _mm_xor_pd(r12,r12); // ab + 8 r13 = _mm_xor_pd(r13,r13); r14 = _mm_xor_pd(r14,r14); r15 = _mm_xor_pd(r15,r15); #pragma GCC diagnostic warning "-Wuninitialized" 

    You should also discard the confusing and useless comment that precedes that code:

     // 10 registers declared here 
  • "Gflops/s" is redundant and incorrect terminology (unless you are talking about acceleration, not speed!)

Source Link
200_success
  • 145.7k
  • 22
  • 191
  • 481

Some trivial observations:

  • #include <omp.h> is unnecessary. (You use OpenMP, but don't call any OpenMP functions.)

  • The return type of main() should be int, not void.

  • The code also compiles with clang (LLVM), if you omit the -masm=intel option.

  • zeromat() could simply be memset(C, 0, n * sizeof(double)).

  • When compiling with -Wall, the code in dgemm_2x8_sse() to zero some registers causes spurious warnings:

    matmul.c:56:21: warning: variable 'r8' is uninitialized when used here [-Wuninitialized] r8 = _mm_xor_pd(r8,r8); // ab ^~ matmul.c:52:5: note: variable 'r8' is declared here register __m128d xmm1, xmm4, // ^ 

    I recommend disabling the warnings with a pair of pragmas:

     #pragma GCC diagnostic ignored "-Wuninitialized" r8 = _mm_xor_pd(r8,r8); // ab r9 = _mm_xor_pd(r9,r9); r10 = _mm_xor_pd(r10,r10); r11 = _mm_xor_pd(r11,r11); r12 = _mm_xor_pd(r12,r12); // ab + 8 r13 = _mm_xor_pd(r13,r13); r14 = _mm_xor_pd(r14,r14); r15 = _mm_xor_pd(r15,r15); #pragma GCC diagnostic warning "-Wuninitialized" 

    You should also discard the confusing and useless comment that precedes that code:

     // 10 registers declared here