Some trivial observations:

- `#include <omp.h>` is unnecessary. (You use OpenMP, but don't call any OpenMP functions.)
- The [return type of `main()`](http://stackoverflow.com/q/204476/1157100) should be `int`, not `void`.
- The code also compiles with `clang` (LLVM), if you omit the `-masm=intel` option.
- `zeromat()` could simply be `memset(C, 0, n * sizeof(double))`.
- When compiling with `-Wall`, the code in `dgemm_2x8_sse()` to zero some registers causes spurious warnings:

 > matmul.c:56:21: warning: variable 'r8' is uninitialized when used here [-Wuninitialized]
 > r8 = _mm_xor_pd(r8,r8); // ab
 > ^~
 > matmul.c:52:5: note: variable 'r8' is declared here
 > register __m128d xmm1, xmm4, //
 > ^

 I recommend disabling the warnings with a pair of pragmas:

 #pragma GCC diagnostic ignored "-Wuninitialized"
 r8 = _mm_xor_pd(r8,r8); // ab
 r9 = _mm_xor_pd(r9,r9);
 r10 = _mm_xor_pd(r10,r10);
 r11 = _mm_xor_pd(r11,r11);
 
 r12 = _mm_xor_pd(r12,r12); // ab + 8
 r13 = _mm_xor_pd(r13,r13);
 r14 = _mm_xor_pd(r14,r14);
 r15 = _mm_xor_pd(r15,r15);
 #pragma GCC diagnostic warning "-Wuninitialized"

 You should also discard the confusing and useless comment that precedes that code:

 // 10 registers declared here