100

Often in my inner loops I need to index an array in a "wrap-around" way, so that (for example) if the array size is 100 and my code asks for element -2, it should be given element 98. In many high level languages such as Python, one can do this simply with my_array[index % array_size], but for some reason C's integer arithmetic (usually) rounds toward zero instead of consistently rounding down, and consequently its modulo operator returns a negative result when given a negative first argument.

Often I know that index will not be less than -array_size, and in these cases I just do my_array[(index + array_size) % array_size]. However, sometimes this can't be guaranteed, and for those cases I would like to know the fastest way to implement an always-positive modulo function. There are several "clever" ways to do it without branching, such as

inline int positive_modulo(int i, int n) { return (n + (i % n)) % n; } 

or

inline int positive_modulo(int i, int n) { return (i % n) + (n * (i < 0)); } 

Of course I can profile these to find out which is the fastest on my system, but I can't help worrying that I might have missed a better one, or that what's fast on my machine might be slow on a different one.

So is there a standard way to do this, or some clever trick that I've missed that's likely to be the fastest possible way?

Also, I know it's probably wishful thinking, but if there's a way of doing this that can be auto-vectorised, that would be amazing.

21
  • Are you consistently modding over the same number? Commented Feb 21, 2013 at 7:59
  • 2
    Then, you'll want to either hard-code the modulus, or put it in as a compile-time constant. You'll get much better performance that way than whatever tricks you can play with the sign. Commented Feb 21, 2013 at 8:00
  • 2
    Well, modding over a power of two is trivial; you just do & (n-1) regardless of sign. Commented Feb 21, 2013 at 8:01
  • 3
    I'm surprised no one pointed this out, but in C % isn't modulus, it returns the remainder. Even fmod returns the remainder if you look at the documentation: cplusplus.com/reference/cmath/fmod So I think it's weird to call this positive modulus, since the behavior you're looking for is what modulus is supposed to be: en.wikipedia.org/wiki/Modular_arithmetic Commented Mar 21, 2014 at 0:08
  • 1
    With (i % n) + (n * (i < 0)) I'm seeing result n instead of 0 on negative exact multiples, e.g (-3, 3) -> 3. Commented May 16, 2019 at 21:47

12 Answers 12

101

The standard way I learned is

inline int positive_modulo(int i, int n) { return (i % n + n) % n; } 

This function is essentially your first variant without the abs (which, in fact, makes it return the wrong result). I wouldn't be surprised if an optimizing compiler could recognize this pattern and compile it to machine code that computes an "unsigned modulo".

Edit:

Moving on to your second variant: First of all, it contains a bug, too -- the n < 0 should be i < 0.

This variant may not look as if it branches, but on a lot of architectures, the i < 0 will compile into a conditional jump. In any case, it will be at least as fast to replace (n * (i < 0)) with i < 0? n: 0, which avoids the multiplication; in addition, it's "cleaner" because it avoids reinterpreting the bool as an int.

As to which of these two variants is faster, that probably depends on the compiler and processor architecture -- time the two variants and see. I don't think there's a faster way than either of these two variants, though.

Sign up to request clarification or add additional context in comments.

7 Comments

Nitpick: It actually won't vectorize because there's generally no SIMD support for modulus.
Would it be more efficient to factor the n out into a template? In the case that the function cannot be inlined, the compiler may be able to play some tricks to improve performance.
Oops, you're right about the abs(), I've edited it out of my question.
Notice that for (-3 mod 3) using (i % n) + (n * (i < 0)) or (i % n) + (i < 0 ? n : 0), the result is 3: (-3 % 3) == 0 and (3 * (-3 < 0)) == 3, probably not the desired result.
One problem is if n is negative, then the module becomes negative... This instead seem to work: return (i % n + std::abs(n)) % n;
|
37

Most of the time, compilers are very good at optimizing your code, so it is usually best to keep your code readable (for both compilers and other developers to know what you are doing).

Since your array size is always positive, I suggest you to define the quotient as unsigned. The compiler will optimize small if/else blocks into conditional instructions which have no branches:

unsigned modulo( int value, unsigned m) { int mod = value % (int)m; if (mod < 0) { mod += m; } return mod; } 

This creates a very small function without branches:

modulo(int, unsigned int): mov eax, edi cdq idiv esi add esi, edx mov eax, edx test edx, edx cmovs eax, esi ret 

For example modulo(-5, 7) returns 2.

Unfortunately, since the quotient is not known they must perform an integer division, which is a bit slow compared to other integer operations. If you know the sizes of your array are power of two, I recommend keeping these function definitions in a header, so that the compiler can optimize them into a more efficient function. Here is the function unsigned modulo256(int v) { return modulo(v,256); }:

modulo256(int): # @modulo256(int) mov edx, edi sar edx, 31 shr edx, 24 lea eax, [rdi+rdx] movzx eax, al sub eax, edx lea edx, [rax+256] test eax, eax cmovs eax, edx ret 

See assembly: https://gcc.godbolt.org/z/DG7jMw

See comparison with most voted answer: http://quick-bench.com/oJbVwLr9G5HJb0oRaYpQOCec4E4

Benchmark comparison

Edit: turns out Clang is able to generate a function without any conditional move instructions (which cost more than regular arithmetic operations). This difference is completely negligible in the general case due to the fact that the integral division takes around 70% of the total time.

Basically, Clang shifts value right to extend its sign bit to the whole width of m (that is 0xffffffff when negative and 0 otherwise) which is used to mask the second operand in mod + m.

unsigned modulo (int value, unsigned m) { int mod = value % (int)m; m &= mod >> std::numeric_limits<int>::digits; return mod + m; } 

11 Comments

Thank you, this is very interesting. Also interesting that specifying 29 gives some saving over the generic function, even if a power of 2 is even faster. I ran the benchmark on g++ also, with similar results. I'm accepting this answer because I think it does actually supersede the information in the other, higher-voted answers.
If you want to know the exact methodology for this, there are books/websites that will give you more information about this : for example the PowerPC Compiler Writer's Guide has a section on this at pages 52 to 61, and Matt Godbolt talked about this in his "What has my compiler done for me lately ?" talk, at the 35th minute
Thanks. I've updated the answer to include why not using conditional moves is faster, even though I only see improvements (with GCC) for the constant division and not for the generic case.
This code is incorrect. It won't work for modulo(-x, x) and returns x in such a case.
You have to righshift mod instead, not value.
|
31

Modulo a power of two, the following works (assuming twos complement representation):

return i & (n-1); 

9 Comments

Many thanks! I will leave the question open in case someone has a good answer for the general case, but I will probably end up using this.
what is n here? n mod i or i mod n?
As simple as the answer is, I would be very careful. Remember different architectures generally store negative numbers in different ways. Hence bitwise operators on negative numbers cant differ with different compilers and/or architectures.
i mod n == i & (n-1) when n is a power of two and mod is the aforementioned positive mod. (FYI: modulus is the common mathematical term for the "divisor" when a modulo operation is considered).
@GrijeshChauhan: The limitations are clearly stated: n must be a power of two and numbers must use twos-complement (pretty much every computer produced in the last 20 years). When else will it fail?
|
13

Fastest way to get a positive modulo in C/C++

The following fast? - maybe not as fast as others, yet is simple and functionally correct for all1 a,b -- unlike others.

int modulo_Euclidean(int a, int b) { int m = a % b; if (m < 0) { // m += (b < 0) ? -b : b; // Avoid this form: -b is UB when b == INT_MIN m = (b < 0) ? m - b : m + b; } return m; } 

[Edit 2022]

From here, added tests to handle INT_MIN mod -1 and detect mod 0.

int modulo_Euclidean2(int a, int b) { if (b == 0) TBD_Code(); // Perhaps return -1 to indicate failure? if (b == -1) return 0; // This test needed to prevent UB of `INT_MIN % -1`. int m = a % b; if (m < 0) { // m += (b < 0) ? -b : b; // Avoid this form: -b is UB when b == INT_MIN m = (b < 0) ? m - b : m + b; } return m; } 

[Edit 2025]

Alternative:

 // m += (b < 0) ? -b : b; // Avoid this form: -b is UB when b == INT_MIN m = (b < 0) ? m - b : m + b; m -= (b < 0) ? b : -b; // New alternative. 

Various other answers have mod(a,b) weaknesses especially when b < 0.

See Euclidean division for ideas about b < 0


inline int positive_modulo(int i, int n) { return (i % n + n) % n; } 

Fails when i % n + n overflows (think large i, n) - Undefined behavior.


return i & (n-1); 

Relies on n as a power of two. (Fair that the answer does mention this.)


int positive_mod(int i, int n) { /* constexpr */ int shift = CHAR_BIT*sizeof i - 1; int m = i%n; return m+ (m>>shift & n); } 

Often fails when n < 0. e, g, positive_mod(-2,-3) --> -5


int32_t positive_modulo(int32_t number, int32_t modulo) { return (number + ((int64_t)modulo << 32)) % modulo; } 

Obliges using 2 integer widths. (Fair that the answer does mention this.)
Fails with modulo < 0. positive_modulo(2, -3) --> -1.


inline int positive_modulo(int i, int n) { int tmp = i % n; return tmp ? i >= 0 ? tmp : tmp + n : 0; } 

Often fails when n < 0. e, g, positive_modulo(-2,-3) --> -5


1 Exceptions: In C, a%b is not defined when a/b overflows as in a/0 or INT_MIN/-1.

4 Comments

Explaining the failure of the other answers is helpful.
Can you elaborate a bit on why += results in UB ?
@cassepipe += is fine, but -b when b == INT_MAN is UB. Note added to answer.
@chux-ReinstateMonica Thanks ! Indeed INT_MIN has no positive equivalent in the int range as it would be above INT_MAX by one. And this is because of en.wikipedia.org/wiki/Two%27s_complement (Putting that there for beginners such as myself not so long ago)
11

An old-school way to get the optional addend using twos-complement sign-bit propagation:

int positive_mod(int i, int m) { /* constexpr */ int shift = CHAR_BIT * sizeof i - 1; int r = i % m; return r + (r >> shift & m); } 

I need to index an array in a "wrap-around" way

as another answer points out, if you're worried about arrays with negative sizes you should use more mathematically-pure, general methods.

8 Comments

Old-school hard to read hack. I like it. Though I wonder if (i>>shift & n) might be faster as the bitshift operation will otherwise have to wait for the modulo operation to finish.
It would be faster but it would give incorrect results for e.g. -2 mod 2.
Shoot, you are right. And now that you mention it, that is true for (i % n) + (n * (i < 0)) as well.
Assuming CHAR_BIT is a global contest (of the system?) sizeof of what? I do not understand if it is CHAR_BIT*(sizeof(i)) -1or
@J.Schultke Okay, I changed some names anyway to fix possible confusion, now m is the modulus and r is the result, no what-is-this-number-really n's left.
|
4

If you can afford to promote to a larger type (and do your modulo on the larger type), this code does a single modulo and no if:

int32_t positive_modulo(int32_t number, int32_t modulo) { return (number + ((int64_t)modulo << 32)) % modulo; } 

1 Comment

why would this guarantee positive modulo at all ? echo ' ( n = 2^31 - 7 ); ( m = -3 ); ( d = ( m * 2^32 ) + n ); n % m; d % m' | bc ::::> n := 2147483641 :::: m := -3 :::::::: :::::::: :: d := -10737418247 ::::::::::: :::::::: :::::::::::::::: even though n % m := 1 , your approach made it worse —> ( n + ( m << 32 )) % m := -2
4

If you want to avoid all conditional paths (including the conditional move generated above, (For example if you need this code to vectorize, or to run in constant time), You can use the sign bit as a mask:

unsigned modulo(int value, unsigned m) { int shift_width = sizeof(int) * 8 - 1; int tweak = (value >> shift_width); int mod = ((value - tweak) % (int) m) + tweak; mod += (tweak & m); return mod; } 

Here are the quickbench results You can see that on gcc it's better in the generic case. For clang it's the same speed in the generic case, because clang generates the branch free code in the generic case. The technique is useful regardless, because the compiler can't always be relied on to produce the particular optimization, and you may have to roll it by hand for vector code.

7 Comments

I know the OP doesn't need constant time, as it's for an array lookup, but this has been linked to as the fast way to compute modulo, which someone may need to do in constant time, so I figured it was worth mentioning.
Your godbolt link has a mistake because you are performing unsigned division instead of signed (you are missing the cast).
Intel doesn't currently support integer divison as a vector unit, and neither does Arm, but they aren't the only CPUs with vector units, and they may get integer division in the future.
I've given a small look and the quick bench results show the same performance when m is not a constant (just ran your link with clearing the cached results). GCC reports the same assembly if you code it like m &= value < 0? UINT_MAX : 0u; mod += m; which is much more readable than using the shift right (the right shift is just adding an all 1s bitmask when the sign bit is set). The fact that Clang does the thing right proves even further than letting the compiler do the dirty work is usually a good idea.
Shifting an expression of signed type and negative value is sadly implementation-defined (see here). Moreover, it is more portable to use CHAR_BIT (although almost all modern platforms set it to 8). This being said, this code should work many common platforms.
|
2

You can as well do array[(i+array_size*N) % array_size], where N is large enough integer to guarantee positive argument, but small enough for not to overflow.

When the array_size is constant, there are techniques to calculate the modulus without division. Besides of power of two approach, one can calculate a weighted sum of bitgroups multiplied by the 2^i % n, where i is the least significant bit in each group:

e.g. 32-bit integer 0xaabbccdd % 100 = dd + cc*[2]56 + bb*[655]36 + aa*[167772]16, having the maximum range of (1+56+36+16)*255 = 27795. With repeated applications and different subdivision one can reduce the operation to few conditional subtractions.

Common practises also include approximation of division with reciprocal of 2^32 / n, which usually can handle reasonably large range of arguments.

 i - ((i * 655)>>16)*100; // (gives 100*n % 100 == 100 requiring adjusting...) 

Comments

0

Your second example is better than the first. A multiplication is a more complex operation than an if/else operation, so use this:

inline int positive_modulo(int i, int n) { int tmp = i % n; return tmp ? i >= 0 ? tmp : tmp + n : 0; } 

3 Comments

1) you're right, I edited the code. 2) if i is negative the return is a negative, i%n returns a negative number, for example -102%100 returns -2 so u just add n to the result
1) Perhaps simply return tmp < 0 ? tmp + n : tmp;. 2) This answer has an advantage over highly rated one in that it never overflows.
Re-state as "it" was unclear: This answer never overflows. (advantage) (if n > 0). The other answer may overflow. (weakness).
0

Rather than try to find a generic way to compute a positive modulus, the simplest solution to your problem is to plainly convert negative indices into a positive value. This will be branchless on a processor with conditional moves. If you don't need extra bounds checks this will produce a minimal amount of code with no divide or multiply ops.

char pos_index(char *my_array, size_t array_size, int index) { if(index < 0) index = array_size - index; return my_array[index]; } 

With modular bounds you have the cost of one extra divide:

char pos_index(char *my_array, size_t array_size, int index) { index = index % array_size; // Remainder; could be negative if(index < 0) index = array_size - index; return my_array[index]; } 

Being less clever is often a better approach with modern hardware and compilers. Conditional moves are a superpower you can leverage to tame otherwise stall inducing code.

Comments

0

If you want to optimize the following function while preserving the behavior for negative operands

inline int positive_modulo(int i, int n) { return (n + (i % n)) % n; } 

You can use the following implementation

inline int positive_modulo(int i, int n) { int m = i % n; if ((m != 0) & ((i ^ n) < 0)) m += n; return m; } 

This implementation will give the same result as the first implementation for negative operands such as positive_modulo(-7, 3), positive_modulo(7, -3) or positive_modulo(-7, -3).

This implementation also fixes the undefined behavior when the second operand is the maximum integer value 2147483647 or the minimum integer value -2147483648 (assuming 32-bit integer).


Note: (i ^ n) < 0 is true when i and n have opposite signs, false otherwise.

Comments

-1

So one strange approach not yet mentioned is that

if precision *is of *NO CONCERN

(usually because either [ 1 ] you know ahead of time the product between the modulus and the largest absolute input value never overflows the data type, or [ 2 ] there's seamless big-int support,

you can actually front-load the conversion to unsigned space by multiplying the dividend with ….

1 - abs(modulus)

….. thus avoiding the need to either perform post-modulo-[%] adjustments or executing 2 modulo-[%] ops. This is simply the same 2's-complement idea generalized to any integer.

function always_nonneg_mod(__, ___, _) { # __| dividend # ___| modulus ___ = int(___) __ = int(__) if ((_ = (___ >= !___ || ___ = -___)) + _ >= ___) { # modulus := [*] 0 => unsigned NaN # [*] +/- 1 => zero (0) # [*] +/- 2 => (__ % 2)^2 return _ < ___ ? (__ % ___)^___ \ : _-- ? _ \ : (_ ^= _++) - _ } else return (_ - (__ < _) * ___) * __ % ___ } 

-38 19 0 -19 19 0 0 19 0 19 19 0 -37 19 1 -18 19 1 1 19 1 20 19 1 -36 19 2 -17 19 2 2 19 2 21 19 2 -35 19 3 -16 19 3 3 19 3 22 19 3 -34 19 4 -15 19 4 4 19 4 23 19 4 -33 19 5 -14 19 5 5 19 5 24 19 5 -32 19 6 -13 19 6 6 19 6 25 19 6 -31 19 7 -12 19 7 7 19 7 26 19 7 -30 19 8 -11 19 8 8 19 8 27 19 8 -29 19 9 -10 19 9 9 19 9 28 19 9 -28 19 10 -9 19 10 10 19 10 29 19 10 -27 19 11 -8 19 11 11 19 11 30 19 11 -26 19 12 -7 19 12 12 19 12 31 19 12 -25 19 13 -6 19 13 13 19 13 32 19 13 -24 19 14 -5 19 14 14 19 14 33 19 14 -23 19 15 -4 19 15 15 19 15 34 19 15 -22 19 16 -3 19 16 16 19 16 35 19 16 -21 19 17 -2 19 17 17 19 17 36 19 17 -20 19 18 -1 19 18 18 19 18 37 19 18 

The special cases involve { -2, -1, 0, +1, +2 } :

[*] For zero(0) modulus it simply returns +NaN without throwing any particular error, fatal or otherwise. 
[*] For modulus of +1 or -1, it avoids any headaches regarding INT_MIN by simply returning the mathematically correct result of zero(0). 
[*] For modulus of +2 or -2, no point to gauge sign of dividend when one can directly square any modulo result and make it non-negative. 

(I didn't write it as (__ & 1) != 0 because in a polymorphic paradigm, it would take far longer for the system to check whether input data type is a floating point one, and if so, perform integer re-casting before obtaining the LSB)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.