With the usual caveats about profiling if this really matters (it won't unless you're doing this A LOT):
As @phuclv says, on modern processors quotient and remainder will be calculated in one instruction. All these assume unsigned numbers without worrying about overflow. With x86-64 GCC -O3
unsigned int f(unsigned int x, unsigned int y) { return x / y + (x % y != 0); } produces
mov eax, edi xor edx, edx # zero edx div esi # divides edx:eax (y) by esi (x) # eax = quotient, edx = remainder cmp edx, 1 # set CF = (edx - 1 < 0), i.e. edx == 0 sbb eax, -1 # eax -= CF - 1, i.e. eax += 1 - CF, no branch ret https://godbolt.org/z/4Gsn3Kj5s
unsigned int f(unsigned int x, unsigned int y) { return (x + y - 1) / y; } is clever and uses lea to do the addition and subtraction
lea eax, [rsi-1+rdi] xor edx, edx div esi ret https://godbolt.org/z/9sfsc1Wa5
For 64-bit inputs, the results are similar but with 64-bit registers instead.
I would guess the shorter codeLEA is faster than CMP/SBB as LEA is a fast instruction, but I didn't benchmark anything.
In a deleted answer, @Matt suggests the remainder increment version is faster, but his g++ compile command didn't include optimization flag which is supsect.