39

Editor's clarification: When this was originally posted, there were two issues:

  • Test performance drops by a factor of three if seemingly inconsequential statement added
  • Time taken to complete the test appears to vary randomly

The second issue has been solved: the randomness only occurs when running under the debugger.

The remainder of this question should be understood as being about the first bullet point above, and in the context of running in VC++ 2010 Express's Release Mode with optimizations "Maximize Speed" and "favor fast code".

There are still some Comments in the comment section talking about the second point but they can now be disregarded.


I have a simulation where if I add a simple if statement into the while loop that runs the actual simulation, the performance drops about a factor of three (and I run a lot of calculations in the while loop, n-body gravity for the solar system besides other things) even though the if statement is almost never executed:

if (time - cb_last_orbital_update > 5000000) { cb_last_orbital_update = time; } 

with time and cb_last_orbital_update being both of type double and defined in the beginning of the main function, where this if statement is too. Usually there are computations I want to run there too, but it makes no difference if I delete them. The if statement as it is above has the same effect on the performance.

The variable time is the simulation time, it increases in 0.001 steps in the beginning so it takes a really long time until the if statement is executed for the first time (I also included printing a message to see if it is being executed, but it is not, or at least only when it's supposed to). Regardless, the performance drops by a factor of 3 even in the first minutes of the simulation when it hasn't been executed once yet. If I comment out the line

cb_last_orbital_update = time; 

then it runs faster again, so it's not the check for

time - cb_last_orbital_update > 5000000 

either, it's definitely the simple act of writing current simulation time into this variable.

Also, if I write the current time into another variable instead of cb_last_orbital_update, the performance does not drop. So this might be an issue with assigning a new value to a variable that is used to check if the "if" should be executed? These are all shots in the dark though.

Disclaimer: I am pretty new to programming, and sorry for all that text.

I am using Visual C++ 2010 Express, deactivating the stdafx.h precompiled header function didn't make a difference either.

EDIT: Basic structure of the program. Note that nowhere besides at the end of the while loop (time += time_interval;) is time changed. Also, cb_last_orbital_update has only 3 occurrences: Declaration / initialization, plus the two times in the if statement that is causing the problem.

int main(void) { ... double time = 0; double time_interval = 0.001; double cb_last_orbital_update = 0; F_Rocket_Preset(time, time_interval, ...); while(conditions) { Rocket[active].Stage[Rocket[active].r_stage].F_Update_Stage_Performance(time, time_interval, ...); Rocket[active].F_Calculate_Aerodynamic_Variables(time); Rocket[active].F_Calculate_Gravitational_Forces(cb_mu, cb_pos_d, time); Rocket[active].F_Update_Rotation(time, time_interval, ...); Rocket[active].F_Update_Position_Velocity(time_interval, time, ...); Rocket[active].F_Calculate_Orbital_Elements(cb_mu); F_Update_Celestial_Bodies(time, time_interval, ...); if (time - cb_last_orbital_update > 5000000.0) { cb_last_orbital_update = time; } Rocket[active].F_Check_Apoapsis(time, time_interval); Rocket[active].F_Status_Check(time, ...); Rocket[active].F_Update_Mass (time_interval, time); Rocket[active].F_Staging_Check (time, time_interval); time += time_interval; if (time > 3.1536E8) { std::cout << "\n\nBreak main loop! Sim Time: " << time << std::endl; break; } } ... } 

EDIT 2:

Here is the difference in the assembly code. On the left is the fast code with the line

cb_last_orbital_update = time; 

outcommented, on the right the slow code with the line.

EDIT 4:

So, i found a workaround that seems to work just fine so far:

int cb_orbit_update_counter = 1; // before while loop if(time - cb_orbit_update_counter * 5E6 > 0) { cb_orbit_update_counter++; } 

EDIT 5:

While that workaround does work, it only works in combination with using __declspec(noinline). I just removed those from the function declarations again to see if that changes anything, and it does.

EDIT 6: Sorry this is getting confusing. I tracked down the culprit for the lower performance when removing __declspec(noinline) to this function, that is being executed inside the if:

__declspec(noinline) std::string F_Get_Body_Name(int r_body) { switch (r_body) { case 0: { return ("the Sun"); } case 1: { return ("Mercury"); } case 2: { return ("Venus"); } case 3: { return ("Earth"); } case 4: { return ("Mars"); } case 5: { return ("Jupiter"); } case 6: { return ("Saturn"); } case 7: { return ("Uranus"); } case 8: { return ("Neptune"); } case 9: { return ("Pluto"); } case 10: { return ("Ceres"); } case 11: { return ("the Moon"); } default: { return ("unnamed body"); } } } 

The if also now does more than just increase the counter:

if(time - cb_orbit_update_counter * 1E7 > 0) { F_Update_Orbital_Elements_Of_Celestial_Bodies(args); std::cout << F_Get_Body_Name(3) << " SMA: " << cb_sma[3] << "\tPos Earth: " << cb_pos_d[3][0] << " / " << cb_pos_d[3][1] << " / " << cb_pos_d[3][2] << "\tAlt: " << sqrt(pow(cb_pos_d[3][0] - cb_pos_d[0][0],2) + pow(cb_pos_d[3][1] - cb_pos_d[0][1],2) + pow(cb_pos_d[3][2] - cb_pos_d[0][2],2)) << std::endl; std::cout << "Time: " << time << "\tcb_o_h[3]: " << cb_o_h[3] << std::endl; cb_orbit_update_counter++; } 

I remove __declspec(noinline) from the function F_Get_Body_Name alone, the code gets slower. Similarly, if i remove the execution of this function or add __declspec(noinline) again, the code runs faster. All other functions still have __declspec(noinline).

EDIT 7: So i changed the switch function to

const std::string cb_names[] = {"the Sun","Mercury","Venus","Earth","Mars","Jupiter","Saturn","Uranus","Neptune","Pluto","Ceres","the Moon","unnamed body"}; // global definition const int cb_number = 12; // global definition std::string F_Get_Body_Name(int r_body) { if (r_body >= 0 && r_body < cb_number) { return (cb_names[r_body]); } else { return (cb_names[cb_number]); } } 

and also made another part of the code slimmer. The program now runs fast without any __declspec(noinline). As ElderBug suggested, an issue with the CPU instruction cache then / the code getting too big?

0

6 Answers 6

20

I'd put my money on Intel's branch predictor. http://en.wikipedia.org/wiki/Branch_predictor

The processor assumes (time - cb_last_orbital_update > 5000000) to be false most of the time and loads up the execution pipeline accordingly.

Once the condition (time - cb_last_orbital_update > 5000000) comes true. The misprediction delay is hitting you. You may loose 10 to 20 cycles.

if (time - cb_last_orbital_update > 5000000) { cb_last_orbital_update = time; } 
Sign up to request clarification or add additional context in comments.

17 Comments

Hm, but doesn't that mean it should only be slower for those rare occasions it is true? The performance drop is all the time, even when it hasn't even executed once.
True, but this happens only once every 5,000,000 / 0.001 executions. So probably not that big of a deal.
It could also be a rare'ish branch prediction aliasing symptom. One of his subroutines could map to the same entry in the branch prediction logic and thus the branch prediction could be thrashing back and forth based on two different branches at two different instruction memory locations. Fiddling with the code would offset the location of the branch and cause a different branch prediction entry to be used, thus not causing branch prediction thrash.
try this msdn.microsoft.com/en-us/library/1b3fsfxw%28VS.80%29.aspx In gcc I'd use likely/unlikely, a quick search revealed that the corresponding in msvc is __assume. It will hint the compiler that a certain condition is unlikely to be true/false. I'm curious about the results! On gcc in some cases it makes quite a big dfference
@dau_sama: Okay so i played around a bit more, and it is now working with __assume(cb_last_orbital_update == 0);. I probably just used it wrong before. I still don't really understand how this is a problem, but i am happy to have at least a workaround now, so thank you :)
|
6

Something is happening that you don't expect.

One candidate is some uninitialised variables hanging around somewhere, which have different values depending on the exact code that you are running. For example, you might have uninitialised memory that is sometime a denormalised floating point number, and sometime it's not.

I think it should be clear that your code doesn't do what you expect it to do. So try debugging your code, compile with all warnings enabled, make sure you use the same compiler options (optimised vs. non-optimised can easily be a factor 10). Check that you get the same results.

Especially when you say "it runs faster again (this doesn't always work though, but i can't see a pattern). Also worked with changing 5000000 to 5E6 once. It only runs fast once though, recompiling causes the performance to drop again without changing anything. One time it ran slower only after recompiling twice." it looks quite likely that you are using different compiler options.

2 Comments

As i just now noticed, the randomness is an issue with the compiler. If i run the exe directly, there is no randomness. The problem still exists though, if i delete the one line in the if or substitute it with other (heavy) calculations, it runs significantly faster.
@Kenira I think you mean the IDE, not the compiler. The compiler is a program which turns your code into an executable. The compiler doesn't run the executable.
4

I will try another guess. This is hypothetical, and would be mostly due to the compiler.

My guess is that you use a lot of floating point calculations, and the introduction and use of double values in your main makes the compiler run out of XMM registers (the floating point SSE registers). This force the compiler to use memory instead of registers, and induce a lot of swapping between memory and registers, thus greatly reducing the performance. This would be happening mainly because of the computations functions inlining, because function calls are preserving registers.

The solution would be to add __declspec(noinline) to ALL your computation functions declarations.

11 Comments

Sadly, that did not solve the problem, i added it to every function declaration.
@Kenira I'm out of ideas. Maybe you can check the generated assembly, and post the differences you can see between the fast and slow code ?
You possibly could eliminate this possibility as the cause by having a simple integer test (incrementing counter that triggers when you get (5000000.0 / step-size) times through the main loop).
Okay, so one: the randomness is just from the compiler. If i directly run the exe, it's always the same. Still, the program runs much faster when i outcomment the line, so the problem still exists. Also, i added a link to the assembler differences in the question at the bottom.
While using the new workaround with the int, I just removed __declspec(noinline) again to see if there is a difference and there is! The same difference in performance of about 3 to 1. It seems you were right all along, although it did also depend on how this if statement was set up. A question i now have is now that it seems that the registers are the problem, will this become a problem again in the future if i extend my code, using more and more doubles (i do need a lot)? Or will the use of __declspec(noinline) free up so much of the XMM registers again that that is unlikely?
|
4

I suggest using the Microsoft Profile Guided Optimizer -- if the compiler is making the wrong assumption for this particular branch it will help, and it will in all likelihood improve speed for the rest of the code as well.

1 Comment

I tried to use that, but i'm not getting far. As usual, microsoft tutorials are really not noob friendly. I managed to find the /GL and /LTCG:PGINSTRUMENT options, but no .pgd file is created and i have no idea what i'm doing wrong because i'm mostly just guessing what to do anyway. You don't happen to know a good, step by step tutorial?
2

Workaround, try 2:

The code is now looking like this:

int cb_orbit_update_counter = 1; // before while loop if(time - cb_orbit_update_counter * 5E6 > 0) { cb_orbit_update_counter++; } 

So far it runs fast, plus the code is being executed when it should as far as i can tell. Again only a workaround, but if this proves to work all around then i'm satisfied.

After some more testing, seems good.

Comments

1

My guess is that this is because the variable cb_last_orbital_update is otherwise read-only, so when you assign to it inside the if, it destroys some optimizations that the compiler has for read-only variables (e.g. perhaps it's now stored in memory instead of a register).

Something to try (although this might still not work) is to make a third variable that is initialized via cb_last_orbital_update and time depending on whether the condition is true, and using that one instead. Presumably, the compiler would now treat that variable as a constant, but I'm not sure.

1 Comment

I also tried putting the condition into another variable, either only time - cb_last_orbital_update or time - cb_last_orbital_update - 5E6 and then checking that for either >5E6 or >0, both did not help. Only switching to an int like i mentioned in my workaround helped so far.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.