I have the following problem (g++ (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4):
When I use _mm256_slli_si256() directly, such as:
__m256i x = _mm256_set1_epi8(0xff); x = _mm256_slli_si256(x, 3); the code compiles without problem (g++ -Wall -march=native -O3 -o shifttest shifttest.C).
However, if I wrap it into a function
__m256i doit(__m256i x, const int imm) { return _mm256_slli_si256(x, imm); } the compiler complains that
/usr/lib/gcc/x86_64-linux-gnu/4.8/include/avx2intrin.h: In function '__m256i doit(__m256i, int)': /usr/lib/gcc/x86_64-linux-gnu/4.8/include/avx2intrin.h:651:58: error: the last argument must be an 8-bit immediate return (__m256i)__builtin_ia32_pslldqi256 (__A, __N * 8); regardless of whether the function is used or not.
This can't be a problem with the immediate operand, since the function doit() compiles if I use e.g. _mm256_slli_si32(x, imm) instead, and _mm256_slli_si32() also requires an immediate operand.
There is a related bug report on
https://gcc.gnu.org/bugzilla/show_bug.cgi?format=multiple&id=54825
but it is quite old (2012) and relates to gcc 4.8.0, so I thought the patch would be have been incorporated into g++ 4.8.4 already.
Is there a workaround for this problem?
_mm256_alignr_epi8(), by the way. So no workaround using that one..._mm256_setr_m128i()which would help with a workaround using 128-bit shifts is missing completely. Oh, and the same problem as described above occurs with_mm_slli_si128(), so that doesn't work either. Something about this__N * 8seems to confuse the compiler.xmmregister. The two versions share an asm mnemonic, but are different. (AVX2 also introduced variable-shift instructions that take the shift count for each element separately, from the corresponding element in the shift-count register. Those instructions have a different asm mnemonic, as well as a different intrinsic function name.) Oops, there's no variable-count shift-whole-reg-by-bytes, nvm.