1

I am writing a program in c to find the time required to perform a task in terms of CPU cycles. I am avoiding the time conversion, Time in sec = 1/clock cycles as CPU frequency changes while low load in server to save the power consumption.

Program 1 :

 ///////////////////////// RDTSC Functions ///////////////////////// inline void start_rdtsc_rdtscp_ia64() { asm volatile ("CPUID\n\t" "RDTSC\n\t" "mov %%edx, %0\n\t" "mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low):: "%rax", "%rbx", "%rcx", "%rdx"); } inline void end_rdtsc_rdtscp_ia64() { asm volatile("RDTSCP\n\t" "mov %%edx, %0\n\t" "mov %%eax, %1\n\t" "CPUID\n\t": "=r" (cycles_high1), "=r" (cycles_low1):: "%rax", "%rbx", "%rcx", "%rdx"); } inline void warmup_rdtsc_rdtscp_ia64() { start_rdtsc_rdtscp_ia64(); end_rdtsc_rdtscp_ia64(); start_rdtsc_rdtscp_ia64(); end_rdtsc_rdtscp_ia64(); start_rdtsc_rdtscp_ia64(); end_rdtsc_rdtscp_ia64(); } inline uint64_t get_start_ia64() { return (((uint64_t) cycles_high << 32) | cycles_low); } inline uint64_t get_end_ia64() { return (((uint64_t) cycles_high1 << 32) | cycles_low1); } ///////////////////////// RDTSC Timer Functions ///////////////////////// inline void start_timer() { warmup_rdtsc_rdtscp_ia64(); start_rdtsc_rdtscp_ia64(); } inline void end_timer() { end_rdtsc_rdtscp_ia64(); start = get_start_ia64(); end = get_end_ia64(); } inline uint64_t get_cycles_count() { return end - start; } // measuring time here start_timer(); perform a task for length K //Let large K means more computation end_timer(); time in ticks= get_cycles_count() 

Program 2

int main() { while(1); } 

I have used warmup_rdtsc_rdtscp_ia64() function so that my rdtsc and cpuid get ready as per intel document it is required to get correct reading.

Without presence of Program2, I am getting higher cycles reading and I am unable to find a reason and relationship between execution time and length K.

With presence of Program2, I am getting expected result- means I can correlate execution time and length of K. Getting higher clock cycles execution time with Higher length K.

I only understand, Program2 prevent the CPU to go into power saving mode and so my CPU always runs into highest CPU Frequency, whereas without program2 my CPU goes into Power saving mode to save power and run into possible lowest Frequency .

So, my doubt are as follows

  1. Without presence of Progra2, CPU goes into power saving mode (lower CPU frequency ) to save power. Although CPU runs in lower frequency, but still I am expecting almost similar range of clock cycles . I am not using conversion for the same reason Time_in_sec= 1/ Frequency . What is the reason I am getting higher clock cycles ????

  2. Can anyone explain - what is the relationship between timing required to complete a task in clock cycle with respect to different Frequency level ( Power save mode, On-demand mode, Performance mode)

I am using Linux and both gcc and g++.

I need your assistance to understand the relationship between clock cycles required to complete a task in different power mode ( Power save mode, On-demand mode, Performance mode)

Thanks in advance.

1 Answer 1

1

There are many tools that you can use to accomplish your goals, you should try to leverage those and not necessarily roll your own. Here are two of my favorites:

https://perf.wiki.kernel.org/index.php/Main_Page

https://code.google.com/p/likwid/

With regard to your two questions, I offer the following: the time it takes to complete a program is not directly tied to CPU frequency. Many people use a metric known as instructions-per-cycle or IPC. The IPC metric can vary greatly. On current machines at may be as high as 2 to 4, i.e. the CPU retires more than one instruction per CPU cycle because it can issue several instructions per cycle. What IPC you see for your program depends on at least the following: the amount of instruction level parallelism that the CPU can exploit (i.e. because you likely have an out-of-order processor) and the amount of locality in your data (i.e. because more locality means more cache hits, therefore less waiting on memory).

The CPU clock frequency is also varies on modern systems. It can be higher or lower depending on 1) the power saving mode (i.e. if it is a laptop with the power cable unplugged) and 2) the current system load (i.e. you have several CPUs but if most of them are idle, one CPU can run faster than all four can run simultaneously).

What you want, therefore, is the following three things: 1. average IPC for your program 2. average CPU frequency when running your program 3. number of instructions in your program

You can then compute your execution time using the above. And then you can use likwid or perf to tune your performance at this low level. And you can see what effect power saving modes have on CPU frequency.

Good luck.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your answer. To prevent out-of-order I am using CPUID . Another point is while I am executing Program3 (It receive the result of program1) and Program1 in same CPU (I am using taskset to set CPU affinity of a process), I am getting similar result as when program1 and program2 run on same CPU. How will you explain that case ?
BTW, I am using Core-i3 machine and likwid only works upto Core2Duo.
likwid only works upto Core2Duo for disable/enable Hardware prefetching. Let me check for frequency scaling.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.