Linked Questions
120 questions linked to/from Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly?
2 votes
0 answers
370 views
NASM x86 Assembly Optimization [Linear congruential generator] [duplicate]
I have tried writing my first assembly program (NASM x86 assembly). It is an linear congruential generator, or the plan was that is is one. Because of limitations of my assembly skills I had to change ...
0 votes
0 answers
214 views
Ways to optimize the runtime of x86 assembly subroutine for collatz conjecture [duplicate]
I'm trying to optimize the runtime of the code that I wrote for computing the number of iterations it takes to reach 1 in the collatz conjecture Here is my psudeocode for this int threexplusone(int x){...
644 votes
34 answers
95k views
Performance optimization strategies of last resort [closed]
There are plenty of performance questions on this site already, but it occurs to me that almost all are very problem-specific and fairly narrow. And almost all repeat the advice to avoid premature ...
1656 votes
11 answers
202k views
Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations with _mm_popcnt_u64 on Intel CPUs
I was looking for the fastest way to popcount large arrays of data. I encountered a very weird effect: Changing the loop variable from unsigned to uint64_t made the performance drop by 50% on my PC. ...
763 votes
24 answers
1.4m views
What does the C++ standard say about the size of int, long?
I'm looking for detailed information regarding the size of basic C++ types. I know that it depends on the architecture (16 bits, 32 bits, 64 bits) and the compiler. But are there any standards for C++?...
502 votes
40 answers
154k views
When is assembly faster than C? [closed]
One of the stated reasons for knowing assembler is that, on occasion, it can be employed to write code that will be more performant than writing that code in a higher-level language, C in particular. ...
546 votes
17 answers
569k views
How do you get assembler output from C/C++ source in GCC?
How does one do this? If I want to analyze how something is getting compiled, how would I get the emitted assembly code?
207 votes
21 answers
75k views
Is inline assembly language slower than native C++ code?
I tried to compare the performance of inline assembly language and C++ code, so I wrote a function that add two arrays of size 2000 for 100000 times. Here's the code: #define TIMES 100000 void calcuC(...
89 votes
30 answers
67k views
Why do you program in assembly? [closed]
I have a question for all the hardcore low level hackers out there. I ran across this sentence in a blog. I don't really think the source matters (it's Haack if you really care) because it seems to ...
350 votes
4 answers
51k views
Deoptimizing a program for the pipeline in Intel Sandybridge-family CPUs
I've been racking my brain for a week trying to complete this assignment and I'm hoping someone here can lead me toward the right path. Let me start with the instructor's instructions: Your ...
278 votes
5 answers
43k views
Why does GCC use multiplication by a strange number in implementing integer division?
I've been reading about div and mul assembly operations, and I decided to see them in action by writing a simple program in C: File division.c #include <stdlib.h> #include <stdio.h> int ...
192 votes
1 answer
95k views
What is the best way to set a register to zero in x86 assembly: xor, mov or and?
All the following instructions do the same thing: set %eax to zero. Which way is optimal (requiring fewest machine cycles)? xorl %eax, %eax mov $0, %eax andl $0, %eax
86 votes
9 answers
8k views
Which is faster: x<<1 or x<<10?
I don't want to optimize anything, I swear, I just want to ask this question out of curiosity. I know that on most hardware there's an assembly command of bit-shift (e.g. shl, shr), which is a single ...
79 votes
6 answers
9k views
Duplicate code using c++11
I'm currently working on a project and I have the following issue. I have a C++ method that I want to work in two different ways : void MyFunction() { foo(); bar(); foobar(); } void ...
121 votes
3 answers
38k views
How to remove "noise" from GCC/clang assembly output?
I want to inspect the assembly output of applying boost::variant in my code in order to see which intermediate calls are optimized away. When I compile the following example (with GCC 5.3 using g++ -...