I played with a snippet of code below and had it compiled by GCC to compare std::move vs RVO( return value optimization )
#include <iostream> #include <chrono> #include <array> #include <stdlib.h> class LargeObject { public: LargeObject(){ data_.fill( 42 ); } // Move contor LargeObject(LargeObject&& other) noexcept : data_(std::move(other.data_)) {} // Detor ~LargeObject() { } std::array<int,500000> data_; }; // Function that allows RVO LargeObject createWithRVO() { LargeObject obj; return obj; } // Function that forces std::move LargeObject createWithMove() { LargeObject obj; return std::move(obj); // Forces a move, bypassing RVO } int main() { constexpr size_t iterations = 10000; // Timing RVO auto start_rvo = std::chrono::high_resolution_clock::now(); for (size_t i = 0; i < iterations; ++i) { volatile LargeObject obj = createWithRVO(); // RVO in action } auto end_rvo = std::chrono::high_resolution_clock::now(); auto duration_rvo = std::chrono::duration_cast<std::chrono::milliseconds>(end_rvo - start_rvo).count(); // Timing std::move auto start_move = std::chrono::high_resolution_clock::now(); for (size_t i = 0; i < iterations; ++i) { volatile LargeObject obj = createWithMove(); // Forces move } auto end_move = std::chrono::high_resolution_clock::now(); auto duration_move = std::chrono::duration_cast<std::chrono::milliseconds>(end_move - start_move).count(); // Print results std::cout << "RVO duration: " << duration_rvo << " ms" << std::endl; std::cout << "std::move duration: " << duration_move << " ms" << std::endl; return 0; } It turns out that, under -O1, RVO indeed outperformed std::move by a factor of 20x.
However, under -O3, they tie.
I wonder how does compiler achieve this, did it simply "rollback" the std::move to RVO, or is there any other optimization involved?
I tried to compare their LLIR, but I am no expert on it and there are too many differences due to different -O flags, I cannot tell where matters where not :(
Latest:
Thanks for fellow IgorTandetnik's comment that suspects the loop itself is optimized out, to look into it, I added volatile keyword in the two for loops, indeed the for loops seem to be optimized out before as now the running time significantly increased. However RVO still ties std::move, even a bit outperformed by it.
volatilefor generatedLargeObject objin the for loop will be sufficient?