1

I'm very new to c++, but I do have experience with other object oriented programming languages.

I'm attempting to alphabetically sort the lines in a file I have, which has roughly 5300 lines. I originally wrote the program in c++ for practice, but was curious to see how it would perform against my main language, c#.

To my surprise, the same sorting algorithm which takes my c++ function 18-20 seconds to execute, finishes in less than 3 seconds in c#.

Given that I am very new to c++ (and not a very experienced programmer in general), I am sure this must be an error in the way I wrote something.

With all that being said, I am aware that there are quicker sorting methods to use. However, both programs are using the same algorithm so I don't understand the reason for the large performance gap.

I will note that I have tried converting the data to an array instead of a vector, but sorting the array was only consistently about 3 seconds faster (about 15 seconds total instead of 18).

What am I doing wrong? Any/all help is appreciated!

Below is the c++:

void select_sort_alphabetical(std::vector<std::string> _vector) { std::cout << "<< SORTING... >>" << "\n\n"; int char_index, i, j, size = _vector.size(), loop_iterations = 0; char char1, char2; std::string temp; // Iterate through all lines for (i = 0; i < (size - 1); i++) { for (j = (1 + i); j < size; j++) { char_index = 0; char1 = _vector[i][char_index]; // Getting first character of each line char2 = _vector[j][char_index]; // While the letters to be compared are the same, move onto the next character while (char1 == char2) { char_index++; char1 = _vector[i][char_index]; // Setting chars to the next characters in each line char2 = _vector[j][char_index]; } // Once the characters are different - if line x.ascii_code greater than line x+1.ascii_code... if (_vector[i][char_index] > _vector[j][char_index]) // comparing characters { // Swapping places temp = _vector[i]; _vector[i] = _vector[j]; _vector[j] = temp; } loop_iterations++; } } //print_lines_from_vect(_vector); // Clearing contents of vector and freeing up memory (trying to, anyway) _vector.clear(); _vector.shrink_to_fit(); std::cout << "\nIterations: " << loop_iterations << "\n"; } 

and here is the c#:

public static string[] select_sort_alphabetical(string[] lines, ref int loop_iterations) { Console.WriteLine("<< SORTING... >>"); // Iterate through all lines for (int i = 0; i < (lines.Length - 2); i++) { for (int j = (1 + i); j < (lines.Length); j++) { int char_index = 0; char char1 = lines[i][char_index]; // Getting first character of each line char char2 = lines[j][char_index]; // While the letters to be compared are the same, move onto the next character while (char1 == char2) { char_index++; char1 = lines[i][char_index]; char2 = lines[j][char_index]; } // Once the characters are different - if line x.ascii_code greater than line x+1.ascii_code... if (lines[i][char_index] > lines[j][char_index]) // comparing characters { // Swapping places string temp = lines[i]; lines[i] = lines[j]; lines[j] = temp; } loop_iterations++; } } return lines; } 
10
  • 1
    What optimization settings are you using for that C++ program? (i.e. in VS are you using Debug or Release mode)? Commented Mar 21, 2022 at 5:05
  • 6
    Of course in C++ you'd never actually write that sort. You'd just use std::sort, but I get this is a learning example. Commented Mar 21, 2022 at 5:06
  • 3
    If you're impressed now, wait until you write a sorting algorithm that's actually fast! Your algorithm performs more than 200 times slower than "standard" solutions (e.g. quick sort / merge sort) for this 5300-record data set. Note also you should use std::swap to avoid a copy when swapping your strings. Commented Mar 21, 2022 at 5:33
  • 1
    Note that c# swap does not incur copy while c++ does. A closer equivalent in c++ would have been to use std::vector<std::string*> _vector Commented Mar 21, 2022 at 6:15
  • 1
    debug mode has lots of things added for debugging purpose and the code is not optimized so obvious it's very slow in any language and must not be used for benchmarking or publishing to customers: Running a program in Debug mode is incredible slow, Why is this code running over 100 times slower in Debug mode?, Why is Debug Mode slower than Release in VS?, Why is this code 100 times slower in debug?... Commented Mar 21, 2022 at 6:51

2 Answers 2

3

One reason your algorithm would be slower, without taking in account other differences between languages, is swapping lines:

// Swapping places string temp = lines[i]; lines[i] = lines[j]; lines[j] = temp; 
  • in c#, string temp = original means temp and original point to the same string. modifying each will reflect in the other.
  • in c++, string temp = original means temp is a new string. Modifying one will not modify the other.

C++ provides move class members, which allow new objects to "steal" the resources of the original object.

std:.swap, in order to swap objects, make something like

string temp = steal(lines[i]); lines[i] = steal(lines[j]); lines[j] = steal(temp); 

This is a simplification of the real mechanism.

Btw, if you use swap, you will see far faster debug, because that line swapping is a big cost in your algorithm.

Sign up to request clarification or add additional context in comments.

5 Comments

I'm not sure what's the reason of using steal instead of std::move here?
and std::string has it's own specialization so it probably not using a temp string variable anyway.
@appleapple std::string::swap is more sophisitcated than just a tripple std::move, so representing it as such would give a false impression. Also std::move doesn't guarantee the the other instances move-assignment will actually steal the content, it only allows the receiving end to do so.
@Ext3h well isn't it what I said (specialization)? and yes std::move doesn't guarantee that but steal in this answer doesn't either (you'd only recognize this pattern if you imply it's std::move afaict).
@appleapple I've used the term steal not because it exists as a function, but because this is the terminology used to describe move constructors\assigned. Move constructors typically "steal" the resources held by the argument (e.g. pointers to dynamically-allocated objects, file descriptors, TCP sockets, I/O streams, running threads, etc.) rather than make copies of them, and leave the argument in some valid but otherwise indeterminate state
1

This is nothing but a wild guess from my part somewhat educated by years of experience.

This part of the code:

 char1 = _vector[i][char_index]; // Getting first character of each line char2 = _vector[j][char_index]; // While the letters to be compared are the same, move onto the next character while (char1 == char2) { char_index++; char1 = _vector[i][char_index]; // Setting chars to the next characters in each line char2 = _vector[j][char_index]; } 

has a good chance of making your compiler sad. You're extracting 1 byte from each vector every time, storing them on separate variables defined somewhere else, and testing them for equality inside a while loop, after which both dummy variables will continue to be in scope, possibly confusing the compiler's optimizer down the line.

Under normal circumstances, a code snippet comparing two strings character by character would trigger GCC or any equivalent optimizing compiler to substitute the code with either a call to strcmp() (or perhaps its inlined instructions), which is perfectly capable of comparing AT LEAST 8 bytes per CPU cycle. Something like the for loop below should usually do the trick:

 int c; // save value after the loop for (c=0; _vector[i][c]==_vector[j][c] ; ++c) {} 

Something tells me the way you structured the loop and gave a few dummy variables a bigger scope tripped your C++ compiler which couldn't make the right deductions to optimize the code. The reason why you didn't encounter the same issue with your C# code could be anything from safer strings and pointers leading to stronger type deduction, to any number of optimization flags not being identical across languages, or not being identically set for the optimization level.

Either way I would suggest you check the disassembled object code for the actual instructions emitted by the compiler(s) in both cases and put the question to rest. On linux. objdump will do the trick.

objdump -d main.o > main.asm && vim main.asm 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.