Return to Answer

added 81 characters in body

edited Oct 24 at 2:27

11.6k
6
75
121

With the usual caveats about profiling if this really matters (it won't unless you're doing this A LOT):

As @phuclv says, on modern processors quotient and remainder will be calculated in one instruction. All these assume unsigned numbers without worrying about overflow. With x86-64 GCC -O3

unsigned int f(unsigned int x, unsigned int y) { return x / y + (x % y != 0); }

produces

mov eax, edi xor edx, edx # zero edx div esi # divides edx:eax (y) by esi (x) # eax = quotient, edx = remainder cmp edx, 1 # set CF = (edx - 1 < 0), i.e. edx == 0 sbb eax, -1 # eax -= CF - 1, i.e. eax += 1 - CF, no branch ret

https://godbolt.org/z/4Gsn3Kj5s

unsigned int f(unsigned int x, unsigned int y) { return (x + y - 1) / y; }

is clever and uses lea to do the addition and subtraction

lea eax, [rsi-1+rdi] xor edx, edx div esi ret

https://godbolt.org/z/9sfsc1Wa5

For 64-bit inputs, the results are similar but with 64-bit registers instead.

I would guess the shorter codeLEA is faster than CMP/SBB as LEA is a fast instruction, but I didn't benchmark anything.

In a deleted answer, @Matt suggests the remainder increment version is faster, but his g++ compile command didn't include optimization flag which is supsect.

With the usual caveats about profiling if this really matters (it won't unless you're doing this A LOT):

As @phuclv says, on modern processors quotient and remainder will be calculated in one instruction. All these assume unsigned numbers without worrying about overflow. With x86-64 GCC -O3

unsigned int f(unsigned int x, unsigned int y) { return x / y + (x % y != 0); }

produces

mov eax, edi xor edx, edx # zero edx div esi # divides edx:eax (y) by esi (x) # eax = quotient, edx = remainder cmp edx, 1 # set CF = (edx - 1 < 0), i.e. edx == 0 sbb eax, -1 # eax -= CF - 1, i.e. eax += 1 - CF, no branch ret

https://godbolt.org/z/4Gsn3Kj5s

unsigned int f(unsigned int x, unsigned int y) { return (x + y - 1) / y; }

is clever and uses lea to do the addition and subtraction

lea eax, [rsi-1+rdi] xor edx, edx div esi ret

https://godbolt.org/z/9sfsc1Wa5

For 64-bit inputs, the results are similar but with 64-bit registers instead.

I would guess the shorter code is faster, but I didn't benchmark anything.

In a deleted answer, @Matt suggests the remainder increment version is faster, but his g++ compile command didn't include optimization flag which is supsect.

With the usual caveats about profiling if this really matters (it won't unless you're doing this A LOT):

As @phuclv says, on modern processors quotient and remainder will be calculated in one instruction. All these assume unsigned numbers without worrying about overflow. With x86-64 GCC -O3

unsigned int f(unsigned int x, unsigned int y) { return x / y + (x % y != 0); }

produces

mov eax, edi xor edx, edx # zero edx div esi # divides edx:eax (y) by esi (x) # eax = quotient, edx = remainder cmp edx, 1 # set CF = (edx - 1 < 0), i.e. edx == 0 sbb eax, -1 # eax -= CF - 1, i.e. eax += 1 - CF, no branch ret

https://godbolt.org/z/4Gsn3Kj5s

unsigned int f(unsigned int x, unsigned int y) { return (x + y - 1) / y; }

is clever and uses lea to do the addition and subtraction

lea eax, [rsi-1+rdi] xor edx, edx div esi ret

https://godbolt.org/z/9sfsc1Wa5

For 64-bit inputs, the results are similar but with 64-bit registers instead.

I would guess LEA is faster than CMP/SBB as LEA is a fast instruction, but I didn't benchmark anything.

In a deleted answer, @Matt suggests the remainder increment version is faster, but his g++ compile command didn't include optimization flag which is supsect.

fix incorrect code and add explanation

Source Link

edited Mar 3 at 16:24

qwr

11.6k
6
75
121

With the usual caveats about profiling if this really matters (it won't unless you're doing this A LOT):

As @phuclv says, on modern processors quotient and remainder will be calculated in one instruction. All these assume unsigned numbers without worrying about overflow. With x86-64 GCC -O3

unsigned int f(unsigned int x, unsigned int y) { return x / y + (x % y ==!= 0); }

produces

mov eax, edi xor edx, edx # zero edx div esi # divides edx:eax (y) by esi (x) # eax = quotient, edx = remainder cmp edx, 1 # set CF = (edx - 1 < 0), i.e. edx == 0 adcsbb eax, 0 -1 # eax +=-= CF - (1, i.e. eax += 1 - CF, no branch) ret

https://godbolt.org/z/q4zfcY587 https://godbolt.org/z/4Gsn3Kj5s

unsigned int f(unsigned int x, unsigned int y) { return (x + y - 1) / y; }

is clever and uses lea to do the addition and subtraction

lea eax, [rsi-1+rdi] xor edx, edx div esi ret

https://godbolt.org/z/9sfsc1Wa5

For 64-bit inputs, the results are similar but with 64-bit registers instead.

I would guess the shorter code is faster, but I didn't benchmark anything.

In a deleted answer, @Matt suggests the remainder increment version is faster, but his g++ compile command didn't include optimization flag which is supsect.

With the usual caveats about profiling if this really matters (it won't unless you're doing this A LOT):

As @phuclv says, on modern processors quotient and remainder will be calculated in one instruction. All these assume unsigned numbers without worrying about overflow. With x86-64 GCC -O3

unsigned int f(unsigned int x, unsigned int y) { return x / y + (x % y == 0); }

produces

mov eax, edi xor edx, edx # zero edx div esi # divides edx:eax (y) by esi (x) # eax = quotient, edx = remainder cmp edx, 1 # set CF = (edx - 1 < 0) adc eax, 0 # eax += CF (no branch) ret

https://godbolt.org/z/q4zfcY587

unsigned int f(unsigned int x, unsigned int y) { return (x + y - 1) / y; }

is clever and uses lea to do the addition and subtraction

lea eax, [rsi-1+rdi] xor edx, edx div esi ret

https://godbolt.org/z/9sfsc1Wa5

For 64-bit inputs, the results are similar but with 64-bit registers instead.

I would guess the shorter code is faster, but I didn't benchmark anything.

In a deleted answer, @Matt suggests the remainder increment version is faster, but his g++ compile command didn't include optimization flag which is supsect.

With the usual caveats about profiling if this really matters (it won't unless you're doing this A LOT):

As @phuclv says, on modern processors quotient and remainder will be calculated in one instruction. All these assume unsigned numbers without worrying about overflow. With x86-64 GCC -O3

unsigned int f(unsigned int x, unsigned int y) { return x / y + (x % y != 0); }

produces

mov eax, edi xor edx, edx # zero edx div esi # divides edx:eax (y) by esi (x) # eax = quotient, edx = remainder cmp edx, 1 # set CF = (edx - 1 < 0), i.e. edx == 0 sbb eax, -1 # eax -= CF - 1, i.e. eax += 1 - CF, no branch ret

https://godbolt.org/z/4Gsn3Kj5s

unsigned int f(unsigned int x, unsigned int y) { return (x + y - 1) / y; }

is clever and uses lea to do the addition and subtraction

lea eax, [rsi-1+rdi] xor edx, edx div esi ret

https://godbolt.org/z/9sfsc1Wa5

For 64-bit inputs, the results are similar but with 64-bit registers instead.

I would guess the shorter code is faster, but I didn't benchmark anything.

In a deleted answer, @Matt suggests the remainder increment version is faster, but his g++ compile command didn't include optimization flag which is supsect.

Source Link

answered Mar 2 at 4:16

qwr

11.6k
6
75
121

With the usual caveats about profiling if this really matters (it won't unless you're doing this A LOT):

As @phuclv says, on modern processors quotient and remainder will be calculated in one instruction. All these assume unsigned numbers without worrying about overflow. With x86-64 GCC -O3

unsigned int f(unsigned int x, unsigned int y) { return x / y + (x % y == 0); }

produces

mov eax, edi xor edx, edx # zero edx div esi # divides edx:eax (y) by esi (x) # eax = quotient, edx = remainder cmp edx, 1 # set CF = (edx - 1 < 0) adc eax, 0 # eax += CF (no branch) ret

https://godbolt.org/z/q4zfcY587

unsigned int f(unsigned int x, unsigned int y) { return (x + y - 1) / y; }

is clever and uses lea to do the addition and subtraction

lea eax, [rsi-1+rdi] xor edx, edx div esi ret

https://godbolt.org/z/9sfsc1Wa5

For 64-bit inputs, the results are similar but with 64-bit registers instead.

I would guess the shorter code is faster, but I didn't benchmark anything.

In a deleted answer, @Matt suggests the remainder increment version is faster, but his g++ compile command didn't include optimization flag which is supsect.

Collectives™ on Stack Overflow

Return to Answer