I've made a function to calculate the length of a C string (I'm trying to beat clang's optimizer using -O3). I'm running macOS.
_string_length1: push rbp mov rbp, rsp xor rax, rax .body: cmp byte [rdi], 0 je .exit inc rdi inc rax jmp .body .exit: pop rbp ret This is the C function I'm trying to beat:
size_t string_length2(const char *str) { size_t ret = 0; while (str[ret]) { ret++; } return ret; } And it disassembles to this:
string_length2: push rbp mov rbp, rsp mov rax, -1 LBB0_1: cmp byte ptr [rdi + rax + 1], 0 lea rax, [rax + 1] jne LBB0_1 pop rbp ret Every C function sets up a stack frame using push rbp and mov rbp, rsp, and breaks it using pop rbp. But I'm not using the stack in any way here, I'm only using processor registers. It worked without using a stack frame (when I tested on x86-64), but is it necessary?
strlen(or even optimize to a constant) while this can not be inlined. I would expect the twoincinstructions in the loop to be also quite bad. Furthermore, depending on expected string length, there are other optimizations to be had :)callinstruction when calling functions.