std::move vs RVO under -O3

Question

I played with a snippet of code below and had it compiled by GCC to compare std::move vs RVO( return value optimization )

#include <iostream> #include <chrono> #include <array> #include <stdlib.h> class LargeObject { public: LargeObject(){ data_.fill( 42 ); } // Move contor LargeObject(LargeObject&& other) noexcept : data_(std::move(other.data_)) {} // Detor ~LargeObject() { } std::array<int,500000> data_; }; // Function that allows RVO LargeObject createWithRVO() { LargeObject obj; return obj; } // Function that forces std::move LargeObject createWithMove() { LargeObject obj; return std::move(obj); // Forces a move, bypassing RVO } int main() { constexpr size_t iterations = 10000; // Timing RVO auto start_rvo = std::chrono::high_resolution_clock::now(); for (size_t i = 0; i < iterations; ++i) { volatile LargeObject obj = createWithRVO(); // RVO in action } auto end_rvo = std::chrono::high_resolution_clock::now(); auto duration_rvo = std::chrono::duration_cast<std::chrono::milliseconds>(end_rvo - start_rvo).count(); // Timing std::move auto start_move = std::chrono::high_resolution_clock::now(); for (size_t i = 0; i < iterations; ++i) { volatile LargeObject obj = createWithMove(); // Forces move } auto end_move = std::chrono::high_resolution_clock::now(); auto duration_move = std::chrono::duration_cast<std::chrono::milliseconds>(end_move - start_move).count(); // Print results std::cout << "RVO duration: " << duration_rvo << " ms" << std::endl; std::cout << "std::move duration: " << duration_move << " ms" << std::endl; return 0; }

It turns out that, under -O1, RVO indeed outperformed std::move by a factor of 20x.

However, under -O3, they tie.

I wonder how does compiler achieve this, did it simply "rollback" the std::move to RVO, or is there any other optimization involved?

I tried to compare their LLIR, but I am no expert on it and there are too many differences due to different -O flags, I cannot tell where matters where not :(

Latest:

Thanks for fellow IgorTandetnik's comment that suspects the loop itself is optimized out, to look into it, I added volatile keyword in the two for loops, indeed the for loops seem to be optimized out before as now the running time significantly increased. However RVO still ties std::move, even a bit outperformed by it.

it is simply "rollbacks". see the generated assembly. it do "rollback". — jorge is not ai
– jorge is not ai, Commented Mar 7 at 2:59
@user14063792468, many thanks for teaching mate, however I failed to compare both their asm and LLIR... They look too different by me maybe due to the fact that different flag also affected other parts. Mind catching me up again? — PkDrew
– PkDrew, Commented Mar 7 at 3:04
I wouldn't be surprised if, at sufficiently aggressive optimization level, the compiler just optimizes away all those loops to nothing. They don't have any side effects. — Igor Tandetnik
– Igor Tandetnik, Commented Mar 7 at 3:18
@IgorTandetnik, oh how true, yeah I'll do some modification and see again. — PkDrew
– PkDrew, Commented Mar 7 at 3:20
I wonder if simply adding volatile for generated LargeObject obj in the for loop will be sufficient? — PkDrew
– PkDrew, Commented Mar 7 at 3:23

Collectives™ on Stack Overflow

std::move vs RVO under -O3

0

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.