I was playing with the Compiler Explorer and I stumbled upon an interesting behavior with the ternary operator when using something like this:
std::string get_string(bool b) { return b ? "Hello" : "Stack-overflow"; } The compiler generated code for this (clang trunk, with -O3) is this:
get_string[abi:cxx11](bool): # @get_string[abi:cxx11](bool) push r15 push r14 push rbx mov rbx, rdi mov ecx, offset .L.str mov eax, offset .L.str.1 test esi, esi cmovne rax, rcx add rdi, 16 #< Why is the compiler storing the length of the string mov qword ptr [rbx], rdi xor sil, 1 movzx ecx, sil lea r15, [rcx + 8*rcx] lea r14, [rcx + 8*rcx] add r14, 5 #< I also think this is the length of "Hello" (but not sure) mov rsi, rax mov rdx, r14 call memcpy #< Why is there a call to memcpy mov qword ptr [rbx + 8], r14 mov byte ptr [rbx + r15 + 21], 0 mov rax, rbx pop rbx pop r14 pop r15 ret .L.str: .asciz "Hello" .L.str.1: .asciz "Stack-Overflow" However, the compiler generated code for the following snippet is considerably smaller and with no calls to memcpy, and does not care about knowing the length of both strings at the same time. There are 2 different labels that it jumps to
std::string better_string(bool b) { if (b) { return "Hello"; } else { return "Stack-Overflow"; } } The compiler generated code for the above snippet (clang trunk with -O3) is this:
better_string[abi:cxx11](bool): # @better_string[abi:cxx11](bool) mov rax, rdi lea rcx, [rdi + 16] mov qword ptr [rdi], rcx test sil, sil je .LBB0_2 mov dword ptr [rcx], 1819043144 mov word ptr [rcx + 4], 111 mov ecx, 5 mov qword ptr [rax + 8], rcx ret .LBB0_2: movabs rdx, 8606216600190023247 mov qword ptr [rcx + 6], rdx movabs rdx, 8525082558887720019 mov qword ptr [rcx], rdx mov byte ptr [rax + 30], 0 mov ecx, 14 mov qword ptr [rax + 8], rcx ret The same result is when I use the ternary operator with:
std::string get_string(bool b) { return b ? std::string("Hello") : std::string("Stack-Overflow"); } I would like to know why the ternary operator in the first example generates that compiler code. I believe that the culprit lies within the const char[].
P.S: GCC does calls to strlen in the first example but Clang doesn't.
Link to the Compiler Explorer example: https://godbolt.org/z/Exqs6G
Thank you for your time!
sorry for the wall of code
const char*while the strings individually areconst char[N]s, presumably the compiler could optimize the latter much moreconst char*pointing to one of two possible known-constant string literals. That's why clang is able to avoid thestrlenin the branchless version. (GCC misses that optimization). Even clang's branchless version is not well optimized; significantly better would have been possible, e.g. 2x cmov to select between constants, and maybe acmovto select an offset to store at. (So both versions can do 2 partially-overlapping 8-byte stores, writing either 8 or 14 bytes of data, including trailing zeros.) That's better than calling memcpy.movdqaloads and turn the boolean into a vector mask to select between them. (This optimization relies on the compiler knowing it's safe to always store 16 bytes into the retval object, even though the C++ source probably leaves some trailing bytes unwritten. Inventing writes is generally a big no-no for compilers because of thread safety.)