Linked Questions
146 questions linked to/from How to get the CPU cycle count in x86_64 from C++?
0 votes
1 answer
120 views
x86_64. How can I avoid memory dereferencing taking 390 processor cycles instead of 3.6 or at most 10 times more (36 cycles) instead of 100 times more
Trying to optimize concurrent linked lists access, I tried to benchmark the average time that dereferencing takes in x86_64 (my specific processor is a Ryzen). While I knew that the nice old days of ...
0 votes
1 answer
56 views
When I test the cycle number of the module, the results of each test are quite different。
When I test the cycle number of the module, the results of each test are quite different? 1781344-->First test 1264558-->Second test 1388058-->Third test I use __rdtsc() to record cycles,...
3 votes
0 answers
95 views
How to prevent any interference when benchmarking code execution in the linux kernel?
I'm following this whitepaper by intel to benchmark code execution. It uses cpuid to fence the reads of the timestamp registers, which seems to work alright. I'm more interested in the commands ...
1 vote
1 answer
97 views
How can I make a program run for a given amount of time on the CPU in C++?
"I want to write a program that consumes 5 seconds of CPU time, where the time spent off the CPU due to IO, context switches, etc., is not counted towards this 5-second time quota. Currently, I'...
0 votes
0 answers
90 views
non consistent clock cycles with the measurement tools, where exactly this inconsistent cycles come from?
I spent many time to measure exact clock cycles of given instructions, a portion of code written in C. However, I never could measure exactly how many cycles will take during the runtime, I used PAPI, ...
1 vote
0 answers
164 views
Invariant Timestamp Counter is synchronised between cores of the same CPU?
I'm interested in how Invariant TSC behaves on a multi-core CPU, on a classic PC with a single physical CPU. The only thing I could find is that its frequency is constant and the same for all CPU ...
1 vote
1 answer
61 views
Is it preferable to use the total time taken for a canonical workload as a benchmark or count the cycles/time taken by the individual operations?
I'm designing a benchmark for a critical system operation. Ideally the benchmark can be used to detect performance regressions. I'm debating between using the total time for a large workload passed ...
0 votes
0 answers
80 views
Memory fence with std::system_clock::now()
I need to add two memory fences into my codes, in order to prevent my codes from being reordered by either the compiler or CPU. Like this: rec.time_stamp0 = std::system_clock::now(); std::...
2 votes
0 answers
50 views
Inconsistent Clock Cycles Measurement in Assembly Function for Setting Buffer to Zero
I am currently working on an assembly function that sets a buffer to zero. I am measuring the clock cycles it takes to execute the function. However, I have encountered an issue where the number of ...
0 votes
0 answers
57 views
Measuring Latency of Store Instruction in x86-64: Timing with std::chrono and Retirement of Instructions
I'm measuring the latency of a store instruction on an x86-64 processor and would like to understand the nuances of timing this instruction. Here’s my setup and the specific questions I have: Setup: I ...
0 votes
0 answers
33 views
rdtsc delta to nanosecond conversion [duplicate]
Recently, I have been trying to run some performance anlaysis on my program. I want to measure the latency of some functions in cpu ticks and later convert the delta to nanosecond. (I intentionally am ...