0

Announcement of the exercise: https://cses.fi/problemset/task/1640/

I would like to understand why one algorithm is much faster than another (about 25 times) when I think they have the same complexity O(n log n) but the fastest one should in principle loop 1 more time on my list so I would have really thought it slower.

Slow:

#include <bits/stdc++.h> using namespace std; int main() { int n, x; cin >> n >> x; vector<int> a(n); map<int, int> vals; for (int i = 0; i < n; i++) { cin >> a[i]; } for (int i = 0; i < n; i++) { if(vals.count(x - a[i])){ cout << i + 1 << " " << vals[x - a[i]] << "\n"; return 0; } vals[a[i]] = i + 1; } cout << "IMPOSSIBLE" << '\n'; } 

fast:

#include <bits/stdc++.h> using namespace std; int main() { ios::sync_with_stdio(false); cin.tie(0); int n, x; cin >> n >> x; pair<int, int> arr[n]; for (int i = 0; i < n; i++) { cin >>arr[i].first; arr[i].second = i + 1; } sort(arr, arr + n); int i = 0, j = n - 1; while (i < j) { int sum = arr[i].first + arr[j].first; if (sum == x) { cout << arr[i].second << ' ' << arr[j].second; return 0; } else if (sum < x) { i++; } else { j--; } } cout << "IMPOSSIBLE"; return 0; } 
8
  • 1
    Why does only the second one have ios::sync_with_stdio(false); cin.tie(0);? Have you considered that the difference can be due to this? Commented Nov 2, 2020 at 9:53
  • 3
    please provide a minimal reproducible example with input and outputs and what the code is supposed to do Commented Nov 2, 2020 at 10:05
  • I did the test and the difference is much smaller but the second one is still twice as fast. I tried with an unordered_map and I have a time limit on one test. I really don't understand Commented Nov 2, 2020 at 10:07
  • I put a link where everything is written and the code is testable to know the performance, isn't it a good practice ? Commented Nov 2, 2020 at 10:08
  • In the first code, you are performing a sort iteratively, which is slightly sub-optimal with respect to the second code, where sort is performed in one shot. The second part is O(n) only. Commented Nov 2, 2020 at 10:09

1 Answer 1

2

Note that big-O complexity purposefully ignores constant factors as well as architecture-specific details like branch prediction and caching. The work for the first is N + N * (log_2(N) + log_2(N)) (constructing the vector + two map accesses per element), while the second is N + N * (log_2(N) + 1) (constructing the vector, sorting it, then walking it once).

If you then look at the low-level access patterns, you will see that (assuming a binary search tree) every map access needs at worst log_2(2*10^5)=18 memory accesses, some of which will be cache misses, so they will have to wait for (comparatively slow) main memory.

Compare this to the second, where you fetch data linearly from left to right (the best case for a cache) and right to left (about the same). Assuming 4-byte ints and 64-byte cache lines, that means you only need two main memory accesses per every 16 iterations of the loop, and the other memory accesses can be satisfied from the L1 cache.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.