5

What is the clock measure by clock() and clock64() in CUDA ?

According to CUDA documentation the clock is 'per-multiprocessor counter'. According to my understanding this refers to Primary GPU clock (not the shader clock).

But when I measure clock counts and convert it to time values using primary GPU clock frequency, the results I get are twice large as the real values (I measure real values using the kernel execution time from host code using cuda events). This suggests clock() returns the shader clock frequency instead of the primary GPU clock.

How can I solve this confusion ?

EDIT : I calculated the primary GPU clock frequency by dividing the clock rate I get from cudaGetDeviceProperties by 2. As far as I understand the value given by cudaGetDeviceProperties is the shader clock frequency.

10
  • primary GPU clock / Graphics Core Clock / Graphic Clock / Core Clock : Clock rate the the Streaming Multiprocessor runs. <br/> shader clock/ Shader Core Clock / Processor Clock / GPU clock : Clock rate that execution units (CUDA cores) run. This is twice the value of primary GPU clock. This is how I have understood it. Commented Nov 21, 2014 at 16:37
  • 1
    I can confirm that on Fermi devices, cudaDeviceProp::clockRate is the shader clock rate, that is, double value compared to the "primary" GPU clock. On Kepler devices, the two are the same. The answer would be more certain if you tell which device you are using. Not sure about clock() and clock64() - you are probably right in your assumption. Commented Nov 21, 2014 at 16:47
  • 1
    I think @Optimus is referring to the following: On older GPUs (e.g. Fermi family), the execution units run at twice the clock rate of the rest of the graphics domain (this is sometimes refefred to as the "hot clock"). nvidia-smi reports these as "graphics" and "SM" clocks, respectively For example on my Fermi-based Quadro 2000, the former is reported as 625 MHz, the latter as 1251 MHz. Best I know, starting with Kepler all of the non-memory domain of a GPU runs at the same speed, i.e. there is no more SM hot clock. Commented Nov 21, 2014 at 16:53
  • My device is Quadro 2000D. The clock frequency given from 'cudaDeviceProp::clockRate' is 1251 MHz which is the shader clock frequency. The reason for my confusion is in the CUDA documentation they say 'per-multiprocessor counter' which refer to the primary GPU clock. Commented Nov 21, 2014 at 16:57
  • @njuffa : How did you get 625 MHz ? Is it from a datasheet or from a CUDA function ? Commented Nov 21, 2014 at 17:00

1 Answer 1

5

It's true that CUDA documentation says clock() and clock64() returns 'per-multiprocessor counter'. But in Fermi architecture what clock() and clock64() actually returns is the shader clock counter.

The clockRate returned by cudaGetDeviceProperties is the shader clock frequency.

So to compute the time, we have to divide the clock count from clock() or clock64() by shader clock frequency you get from cudaGetDeviceProperties.

Sign up to request clarification or add additional context in comments.

2 Comments

I would caution against converting clock() or clock64() counts to units of time based on the value of cudaDeviceProp::clockRate, since the underlying clock can change dynamically, due to clock boosting and clock throttling. If I recall correctly, clock throttling to cap power consumption has been around since Fermi, and dynamic clock boost was introduced with Kepler.
Yes, I agree with you. But in my case the values I got were accurate. I compared the results I got from clock() with the time measurements I got from CUDA events and they were almost similar.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.