Why is resizing a vector faster than reserve and push_back?

Question

I was doing some quick benchmark tests involving a std:vector. I start with a relatively small vector of 100 ints and call various methods for filling it with 1,000,000 ints. Most of my functions involve clearing the elements and adding the elements again or creating a new vector and moving or swapping it with the original vector. I also have a function that just resizes the vector and overwrites the elements.

You can see the functions in the code below. What's interesting is that resizing the vector and overwriting the elements is by far quickest. I thought that reserving the memory before pushing the elements would improve performance.

I know that std::vector::resize() will resize the vector to contain the new count. According to cppreference:

If the current size is less than count, additional elements are appended and initialized with copies of value.

resize() should be constructing 100 less ints than the other functions. So I'm surprised by the difference in speed. I thought resize() would allocate and initialize the new elements while reserve would just allocate the memory.

#include <algorithm> #include <chrono> #include <iostream> constexpr int InitialSize = 100; constexpr int NewSize = 1000000; void overwrite(std::vector<int>& v) { v.resize(NewSize); for (int i = 0; i < NewSize; ++i) { v[i] = i; } } void clear(std::vector<int>& v) { v.clear(); v.reserve(NewSize); for (int i = 0; i < NewSize; ++i) { v.push_back(i); } } void swap(std::vector<int> &v) { std::vector<int> vnew; vnew.reserve(NewSize); for (int i = 0; i < NewSize; ++i) { vnew.push_back(i); } v.swap(vnew); } void move(std::vector<int> &v) { std::vector<int> vnew; vnew.reserve(NewSize); for (int i = 0; i < NewSize; ++i) { vnew.push_back(i); } v = std::move(vnew); } int main() { { std::vector<int> v(InitialSize); std::iota(v.begin(), v.end(), 1); auto start = std::chrono::high_resolution_clock::now(); move(v); auto finish = std::chrono::high_resolution_clock::now(); std::chrono::duration<double, std::milli> elapsed = finish - start; std::cout << "Move - elapsed time: " << elapsed.count() << " ms" << std::endl; } { std::vector<int> v(InitialSize); std::iota(v.begin(), v.end(), 1); auto start = std::chrono::high_resolution_clock::now(); clear(v); auto finish = std::chrono::high_resolution_clock::now(); std::chrono::duration<double, std::milli> elapsed = finish - start; std::cout << "Clear - elapsed time: " << elapsed.count() << " ms" << std::endl; } { std::vector<int> v(InitialSize); std::iota(v.begin(), v.end(), 1); auto start = std::chrono::high_resolution_clock::now(); swap(v); auto finish = std::chrono::high_resolution_clock::now(); std::chrono::duration<double, std::milli> elapsed = finish - start; std::cout << "Swap - elapsed time: " << elapsed.count() << " ms" << std::endl; } { std::vector<int> v(InitialSize); std::iota(v.begin(), v.end(), 1); auto start = std::chrono::high_resolution_clock::now(); overwrite(v); auto finish = std::chrono::high_resolution_clock::now(); std::chrono::duration<double, std::milli> elapsed = finish - start; std::cout << "Overwrite - elapsed time: " << elapsed.count() << " ms" << std::endl; } return 0; }

Output:

Move - elapsed time: 14.6284 ms Clear - elapsed time: 17.5072 ms Swap - elapsed time: 12.9111 ms Overwrite - elapsed time: 5.19079 ms

LIVE

Quick Bench results.

Can someone explain what's going on here?

It doesn't have to update its size member variable if you've resized first. That could have a minor effect. — Ted Lyngmo
– Ted Lyngmo, Commented Mar 7, 2020 at 23:16
resize can use some avx optimized memset to initialize members more efficiently and batched, while push_back will initialize them one by one. — Tarek Dakhran
– Tarek Dakhran, Commented Mar 7, 2020 at 23:22
I don't see any compilation flags. Is it optimized by default on godbolt? — Jack
– Jack, Commented Mar 7, 2020 at 23:25
You are only testing int. A non-trivial type will trigger different optimisation paths and yield wildly different results, I suspect. Also, always repeat your timing experiments manifold to gain confidence in the results. As a rule of thumb you shouldn't repeat the experiment less than 20 times. And you should pre-heat the test as suggested by other comments here. — bitmask
– bitmask, Commented Mar 8, 2020 at 0:13

Peter - Reinstate Monica · Accepted Answer · 2020-03-08 12:03:01Z

push_back is costlier operation than index based access even if allocation has been taken care of before hand by reserve.

push_back will need to take care of end pointer so that vector size can be computed correctly
push_back will check for realloaction need. Essentially a branch prediction.
push_back will cause copy (or move) of value to be pushed back. In case of int, it shouldn't cause performance difference.

If you see assembly conversion (Taken from godbolt link given in question), index operation is non branching sequence of few moves and shift operation while push_back is much more involved. In long running loop (1000000 in given example), this difference will matter. Compiler optimization level can definitely impact the difference.

For index operator []

 push rbp mov rbp, rsp mov QWORD PTR [rbp-8], rdi mov QWORD PTR [rbp-16], rsi mov rax, QWORD PTR [rbp-8] mov rax, QWORD PTR [rax] mov rdx, QWORD PTR [rbp-16] sal rdx, 2 add rax, rdx pop rbp ret

For push_back

 push rbp mov rbp, rsp sub rsp, 16 mov QWORD PTR [rbp-8], rdi mov QWORD PTR [rbp-16], rsi mov rax, QWORD PTR [rbp-8] mov rdx, QWORD PTR [rax+8] mov rax, QWORD PTR [rbp-8] mov rax, QWORD PTR [rax+16] cmp rdx, rax je .L73 mov rax, QWORD PTR [rbp-8] // When allocation is not needed mov rcx, QWORD PTR [rax+8] mov rax, QWORD PTR [rbp-8] mov rdx, QWORD PTR [rbp-16] mov rsi, rcx mov rdi, rax call void std::allocator_traits<std::allocator<int> >::construct<int, int const&>(std::allocator<int>&, int*, int const&) mov rax, QWORD PTR [rbp-8] mov rax, QWORD PTR [rax+8] lea rdx, [rax+4] mov rax, QWORD PTR [rbp-8] mov QWORD PTR [rax+8], rdx jmp .L75 .L73: // When allocation is needed mov rax, QWORD PTR [rbp-8] mov rdi, rax call std::vector<int, std::allocator<int> >::end() mov rcx, rax mov rdx, QWORD PTR [rbp-16] mov rax, QWORD PTR [rbp-8] mov rsi, rcx mov rdi, rax .L75: nop leave ret

Acorn · Accepted Answer · 2020-03-07 23:45:53Z

2

Can someone explain what's going here?

overwrite is fundamentally different than the others, because you never call push_back which has to check for resize which makes the loop way more complex.

The other three are basically equivalent (minus constant time differences) and will perform differently depending on optimizations, how good the compiler does its job and the standard library implementation.

If you are very lucky, the optimizer may be able to see that the resizing will never happen and behave like overwrite.

answered Mar 7, 2020 at 23:45

Acorn

26.4k7 gold badges50 silver badges78 bronze badges

3 Comments

jignatius Over a year ago

Yes. push_back() does have to check size < capacity, otherwise do a reallocation. This is the main difference between overwrite() and the other methods. I was trying to avoid the reallocation by setting the capacity beforehand. Yet I would have thought the size check would have been a minor difference.

Acorn Over a year ago

@jignatius The point is that even if the reallocation does not happen at runtime, the optimizer still sees all the code that handles that and may not be able to simplify the loop into the equivalent of overwrite's trivial loop. And if the optimizer doesn't arrive at the simplified loop. then it cannot apply the rest of the optimizations either.

Peter - Reinstate Monica Over a year ago

Even if branch prediction is perfect resp. optimized away push_back() in case of an int essentially involves two assignments, i.e. should be roughly twice as expensive in the case of an int: It needs to update the end pointer (or size member)! (Of course a compiler could optimize that away as well; it could actually not do anything much at all except printing the time, since none of all the data shuffled around has an observable effect. The trouble with benchmarks.)

Collectives™ on Stack Overflow

Why is resizing a vector faster than reserve and push_back?

2 Answers 2

Comments

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Related