1

While reading text files line by line, I noticed significant performance drops when using C++ std::getline compared to the C function getline, using GCC 4.8.5.

I limited my tests to a very simple program counting the number of lines.

In C++:

#include <iostream> #include <string> using namespace std; int main() { size_t n = 0; string line; while (getline(cin, line)) ++n; cout << n << endl; return 0; } 

In C:

#include <stdio.h> #include <stdlib.h> int main() { size_t n = 0; char* line = NULL; size_t len = 0; while (getline(&line, &len, stdin) != -1) ++n; printf("%zu\n", n); free(line); return 0; } 

In both cases I use the C++ compiler to make sure that the difference comes exclusively from getline and not from C vs. C++ compiler optimizations.

I start to notice performance drops after "only" a few thousand lines.

Examples with ~600, 2.5k, 20k, 157k, 10M, 40M, 50M, 60M and 80M lines:

Nb lines | Time (C) | Time (C++) | slower ----------+------------+------------+-------- 613 | 0m0.003s | 0m0.005s | 1.5x 2452 | 0m0.002s | 0m0.011s | 5.5x 19616 | 0m0.004s | 0m0.062s | 15x 156928 | 0m0.014s | 0m0.511s | 37x 10043392 | 0m0.776s | 0m31.560s | 41x 40173568 | 0m3.335s | 2m7.752s | 38x 50216960 | 0m5.543s | 2m42.116s | 18x 60260352 | 0m22.571s | 3m13.148s | 9x 80347136 | 0m27.713s | 4m18.272s | 9x 

These numbers should be taken with a pinch of salt but I think they reflect how values increase with the file size, and that there is some sort of maximum around 40-ish slower before things start to even out slightly, probably due to other limitations (hardware maybe?) instead of software issues.

Considering I read almost exclusively files with 100k+ lines I should expect a performance drop of 9 times (at best) if I stick to the C++ code.

Is there a reason for such a big difference? I understand that the STL overhead can be significant but I thought that file access would outweigh such differences.

Also, is there a way to optimize calls to std::getline (apart from using its C counterpart, obviously)?

Additional notes

  • I tried with GCC 9.3.0 and got similar results
  • I tried some compiler optimizations but observed no significant improvement, the build/test call is: gcc -c count.cpp && gcc -o count count.o && (time ./count < infile)
  • Following @Eljay’s advice, I added ios_base::sync_with_stdio(false); and cin.tie(NULL); at the beginning of the C++ code and results are much better, although not quite on par with the C code ; it could still be acceptable for balancing performance vs. readability (note: this isolated code is readable in both C and C++ but the full code is much more readable in C++)
26
  • 10
    These are completely different functions, they just happen to have the same name. Because unmentionable substances where smoked when they named these. You are comparing apples and oranges. Commented Jun 3, 2021 at 11:59
  • 4
    For C++, ios_base::sync_with_stdio(false); and cin.tie(NULL); as the first two lines of main to decouple cin from stdin. Commented Jun 3, 2021 at 12:13
  • 3
    getline() is also not standard C. It is posix. Commented Jun 3, 2021 at 12:30
  • 2
    @OP Please post the optimizations you used when you built the application. If you are timing "debug" or unoptimized builds, the timings you're showing are meaningless. Commented Jun 3, 2021 at 12:42
  • 2
    @Eljay thank you. these lines alone have sped up things significantly. I am now "only" 2-3x slower than C. Commented Jun 3, 2021 at 12:44

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.