3

In the following code:

i = x % b; 

Is the modulo operation faster when b is a power of 2, even if b is not known at compile time (i.e. even though the compiler will not optimize this into a bitwise and because it does not know that b will be a power of 2)? If not, how can I force such an optimization if I know that b will always be a power of 2?

EDIT: I guess what I was really asking in the first part of the question is whether the divl instruction (or similar) itself will run faster for powers of 2.

4
  • 3
    Use logical (bit) operations instead Commented Jan 26, 2015 at 20:01
  • Probably not, but it depends on the compiler (and, to a slight degree, the specific hardware). The logic to check if it's power of 2 and compute how far to shift would take more cycles than just doing the division, in most cases. Commented Jan 26, 2015 at 20:01
  • 2
    "Is operation X faster than operation Y" is nonsensical-to-answer at best, impossible-to-answer at worst, without knowing the exact details of the hardware and software platform you're running and the utilized compiler and all compiler flags. Commented Jan 26, 2015 at 20:02
  • 3
    If you know it will be a power of two, you can always generate the mask yourself. But unless you do a good job it'll probably be slower than just doing the divide. Commented Jan 26, 2015 at 20:03

3 Answers 3

11

Whether it is faster will clearly necessarily be system-dependent.

If you know x to be non-negative, and b to be a power of two, you can use x & (b - 1).

Sign up to request clarification or add additional context in comments.

2 Comments

@Matt It doesn't, it's still system-dependent. I don't know about physical processors, but anything involving some sort of JIT compilation can easily involve divisions that are unknown at C-to-intermediate-language time, but are known at intermediate-language-to-machine-code time.
@Matt processor instructions usually have a fixed number of cycles for a specific addressing mode, changing an operand value will not change the number of cycles.
4

div and idiv in its current implementations in x86 processors do have an execution time that depends on their operands, but indirectly. It really depends on the magnitude of the result. Dividing such that the result is 0 or 1 is faster (but certainly not fast) than dividing and getting a large result. The exact details depend on the microarchitecture, the performance difference can be bigger or smaller.

But it doesn't care about powers of two on any implementation I know of, and in all implementations I know of it is even in the best case much slower than using a bitwise AND, by a factor of at least 9 (Core2 45nm) but usually more like 20, and even worse than that for 64bit operands.

If the compiler knows that b is a power of two it may do something about it, but that's usually reserved for obvious cases, for example compile time constants (not necessarily literals) or a value created as 1 << n. There may be more cases in which your compiler can figure it out, or fewer. Anyway the point here is that it is not sufficient if you know, the compiler has to know, and the rules for that are obviously compiler specific.

Comments

0

Whether the code will perform the optimization at runtime (which is what you're asking) is compiler-dependent, but I doubt many would do that. If you know b is 2^n, just write:

i = x & ((1 << n) - 1); 

and you'll always get the optimization.

2 Comments

you meant x & ((1 << n) - 1).
(1u << n) has some advantages over (1 << n).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.