I have two functions counting the occurrences of a target char in the given input buffer. The functions vary only in how they communicate the result back to the caller; one returns the result and the other writes to a variable passed by reference.
#include <cstdlib> #define BUF_LEN 0x1000 size_t check_count1(const char* buf, char target) { size_t count = 0; for (size_t i = 0; i < BUF_LEN; i++) { if (buf[i] == target) { count++; } } return count; } void check_count2(const char* buf, char target, size_t& count) { for (size_t i = 0; i < BUF_LEN; i++) { if (buf[i] == target) { count++; } } } I am puzzled by how Clang and GCC generate code for these two functions. The loop in check_count1 is vectorized, but for check_count2 it's not. Initially I thought this was due to pointer aliasing in the second case, but specifying __restrict has no effect. Here's the link to compiler explorer.
An older ICC compiler did just fine with both loops. What changed?
bufcan aliascount:-(__restricton both pointers: godbolt.org/z/oc94bxq71count.ifis dropped to have branchless code then it is also vectorized.