33

I am trying to compile a C program using cmake which uses SIMD intrinsics. When I try to compile it, I get two errors

/usr/lib/gcc/x86_64-linux-gnu/5/include/smmintrin.h:326:1: error: inlining failed in call to always_inline ‘_mm_mullo_epi32’: target specific option mismatch _mm_mullo_epi32 (__m128i __X, __m128i __Y)

/usr/lib/gcc/x86_64-linux-gnu/5/include/tmmintrin.h:136:1: error: inlining failed in call to always_inline ‘_mm_shuffle_epi8’: target specific option mismatch _mm_shuffle_epi8 (__m128i __X, __m128i __Y)

This issue has already been solved here StackOverflow by setting

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1") 

I try the very same and many other options. But my project still fails to compile.

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -sse4_1") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=nehalem") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1 -msse4.2") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -ssse3") 
1

2 Answers 2

29

A general method to find the instruction switch for gcc

File intrin.sh:

#!/bin/bash get_instruction () { [ -z "$1" ] && exit func_name="$1[^1-9a-zA-Z_]" header_file=`grep --include=\*intrin.h -Rl "$func_name" /usr/lib/gcc | head -n1` [ -z "$header_file" ] && exit >&2 echo "found in: $header_file" target_directive=`grep "#pragma GCC target(\|$func_name" $header_file | grep -B 1 "$func_name" | head -n1` echo $target_directive | grep -o '"[^,]*[,"]' | sed 's/"//g' | sed 's/,//g' } instruction=`get_instruction $1` if [ -z "$instruction" ]; then echo "Error: function not found: $1" else echo "add this option to gcc: -m$instruction" fi 

Usage:

./intrin.sh _mm_shuffle_epi8 # output: -mssse3 ./intrin.sh _mm_cvtepu8_epi32 # output: -msse4.1 ./intrin.sh _mm_loadu_ps # output: -msse ./intrin.sh _mm_clmulepi64_si128 # output: -mpclmul ./intrin.sh _mm256_loadu_si256 # output: -mavx ./intrin.sh _mm512_and_ps # output: -mavx512dq ./intrin.sh _mm_shl_epi8 # output: -mxop 
Sign up to request clarification or add additional context in comments.

4 Comments

Note that it's usually a good idea to use something like -march=haswell, not just -mavx2 -mfma. Or at least add -mtune=znver2 (Zen 2) or something onto your -m ISA options. The "generic" tuning can be pretty poor for possibly-unaligned 256-bit vectors, especially when your data is usually aligned at runtime but the compiler just doesn't know that. See Why doesn't gcc resolve _mm256_loadu_pd as single vmovupd?. Or if you want to make a binary for your own machine, -march=native.
Excellent answer!
This wont work for some functions, like _mm_shl_epi8, as function definition in include file is directly followed by an opening parenthesis, and not a space. Possible fix: in get_instruction (), replace func_name="$1 " with func_name="$1[^[:alnum:]]".
@Bruno Thank you for the reminder, it has been corrected.
22

Since you are compiling C code, not C++, you need:

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -msse4.1") 

You can get rid of all the other -march XXX and -msseXXX settings.

If you're using a mix of C and C++ then you could also add:

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1") 

2 Comments

I had to add also -maes or ti did not work for me set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse4.1 -maes")
Or better, use -march=native if compiling for your own machine. That will enable everything your CPU has, and set tuning options.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.