How is gcc optimizing this loop?

Question

So I encountered some strange behavior, which I stripped down to the following minimal example:

#include <iostream> #include <vector> int main() { std::vector<int> vec; for(int i = 0; i < 1000; i++) { vec.push_back(2150000 * i); if(i % 100 == 0) std::cout << i << std::endl; } }

When compiling with gcc 7.3.0 using the command

c++ -Wall -O2 program.cpp -o program

I get no warnings. Running the program produces the following output:

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 [ snip several thousand lines of output ] 1073741600 1073741700 1073741800 terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted (core dumped)

which I guess means that I finally ran out of memory for the vector.

Clearly something is wrong here. I guess this has something to do with the fact that 2150000 * 1000 is slightly larger than 2^31, but it's not quite as simple as that -- if I decrease this number to 2149000 then the program behaves as expected:

0 100 200 300 400 500 600 700 800 900

The cout isn't necessary to reproduce this behavior, so I suppose a minimal example is actually

#include <vector> int main() { std::vector<int> vec; for(int i = 0; i < 1000; i++) { vec.push_back(2150000 * i); } }

Running this causes the program to wait for a long time and then crash.

Question

I'm fairly new to C++ at any serious level. Am I doing something stupid here that allows for undefined behavior, and if so, what? Or is this a bug in gcc?

I did try to Google this, but I don't really know what to Google.

Addendum

I see that (signed) integer overflow is undefined behavior in C++. To my understanding, that would only mean that the behavior of the expression

21500000 * i

is undefined -- i.e. that it could evaluate to an arbitrary number. That said, we can see that this expression is at least not changing the value of i.

I suggest you look at differences in the assembly output and use a debugger and if you haven't solved your problem yet, you can ask. I can't reproduce this, for one. — cadaniluk
– cadaniluk, Commented Oct 16, 2018 at 16:31
"To my understanding, that would only mean that the behavior of the expression 21500000 * i is undefined -- i.e. that it could evaluate to an arbitrary number" That's not true. Once a program hits undefined behavior, all its past, present and future behavior becomes undefined as well. — user784668
– user784668, Commented Oct 16, 2018 at 16:33
There are multiple passes that find the UB and transform the loop to an infinite one. In my testing, g++ -O2 x.cc -fno-tree-vrp -fdisable-tree-cunrolli -fdisable-tree-cunroll is required to get the "expected" result, so value range propagation and unrolling are the "offenders". — user784668
– user784668, Commented Oct 16, 2018 at 16:59
Fun fact: clang 6.0 with -O2 works OK but with -O3 does not output anything :) Clang 7.0 works with both -O2 and -O3 as expected. — Anty
– Anty, Commented Oct 16, 2018 at 19:00
The last multiplication is 999, not 1000, and 2149000 * 999 is within the integer range limit — M.M
– M.M, Commented Oct 16, 2018 at 20:46

Daniel McLaury · Accepted Answer · 2018-10-16 17:16:52Z

5

To answer my own question, after examining the assembler output it looks like g++ optimizes this loop by changing

for(int i = 0; i < 1000; i++) { vec.push_back(2150000 * i); }

to something like

for(int j = 0; j < 1000 * 2150000; j += 2150000) { vec.push_back(j); }

I guess the addition is faster than doing a multiplication each cycle, and the rule about overflows being undefined behavior means that this change can be made without worrying about whether this introduces unexpected behavior if that calculation overflows.

Of course the conditional in the optimized loop always fails, so ultimately I end up with something more like

for(int j = 0; true; j += 2150000) { vec.push_back(j); }

answered Oct 16, 2018 at 17:16

Daniel McLaury

4,4641 gold badge18 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user784668 Over a year ago

This works as a mental model, but in reality the loop condition is gone after the early value range propagation pass, way before (in my case, there are 63 passes in between!) induction variable optimization has a chance to reduce the multiplication to addition.

Daniel McLaury Over a year ago

Can you add an answer explaining what that means?

user784668 Over a year ago

Sorry, I only have a general idea of how compiler passes work, I don't know enough to turn it into a good answer.

supercat Over a year ago

According to the published Rationale document for the C Standard, the authors expected (and presumably thought other people would be entitled to expect) that the majority of C implementations would process integer overflow in a way that produced no side-effects, but gcc processes programs in such a way that integer overflow may have unpredictable side-effects even in cases where the upper bits of the result are ignored (e.g. when assigning the product of two unsigned short to an unsigned--a case where Rationale describes what the authors expected to be be typical behavior].

Collectives™ on Stack Overflow

How is gcc optimizing this loop?

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related