What is the clock measure by clock() and clock64() in CUDA?

Question

What is the clock measure by clock() and clock64() in CUDA ?

According to CUDA documentation the clock is 'per-multiprocessor counter'. According to my understanding this refers to Primary GPU clock (not the shader clock).

But when I measure clock counts and convert it to time values using primary GPU clock frequency, the results I get are twice large as the real values (I measure real values using the kernel execution time from host code using cuda events). This suggests clock() returns the shader clock frequency instead of the primary GPU clock.

How can I solve this confusion ?

EDIT : I calculated the primary GPU clock frequency by dividing the clock rate I get from cudaGetDeviceProperties by 2. As far as I understand the value given by cudaGetDeviceProperties is the shader clock frequency.

primary GPU clock / Graphics Core Clock / Graphic Clock / Core Clock : Clock rate the the Streaming Multiprocessor runs. <br/> shader clock/ Shader Core Clock / Processor Clock / GPU clock : Clock rate that execution units (CUDA cores) run. This is twice the value of primary GPU clock. This is how I have understood it. — Optimus
– Optimus, Commented Nov 21, 2014 at 16:37
I can confirm that on Fermi devices, cudaDeviceProp::clockRate is the shader clock rate, that is, double value compared to the "primary" GPU clock. On Kepler devices, the two are the same. The answer would be more certain if you tell which device you are using. Not sure about clock() and clock64() - you are probably right in your assumption. — void_ptr
– void_ptr, Commented Nov 21, 2014 at 16:47
I think @Optimus is referring to the following: On older GPUs (e.g. Fermi family), the execution units run at twice the clock rate of the rest of the graphics domain (this is sometimes refefred to as the "hot clock"). nvidia-smi reports these as "graphics" and "SM" clocks, respectively For example on my Fermi-based Quadro 2000, the former is reported as 625 MHz, the latter as 1251 MHz. Best I know, starting with Kepler all of the non-memory domain of a GPU runs at the same speed, i.e. there is no more SM hot clock. — njuffa
– njuffa, Commented Nov 21, 2014 at 16:53
My device is Quadro 2000D. The clock frequency given from 'cudaDeviceProp::clockRate' is 1251 MHz which is the shader clock frequency. The reason for my confusion is in the CUDA documentation they say 'per-multiprocessor counter' which refer to the primary GPU clock. — Optimus
– Optimus, Commented Nov 21, 2014 at 16:57
@njuffa : How did you get 625 MHz ? Is it from a datasheet or from a CUDA function ? — Optimus
– Optimus, Commented Nov 21, 2014 at 17:00

Community · Accepted Answer · 2020-06-20 09:12:55Z

5

It's true that CUDA documentation says clock() and clock64() returns 'per-multiprocessor counter'. But in Fermi architecture what clock() and clock64() actually returns is the shader clock counter.

The clockRate returned by cudaGetDeviceProperties is the shader clock frequency.

So to compute the time, we have to divide the clock count from clock() or clock64() by shader clock frequency you get from cudaGetDeviceProperties.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Nov 21, 2014 at 17:37

Optimus

4151 gold badge4 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

njuffa Over a year ago

I would caution against converting clock() or clock64() counts to units of time based on the value of cudaDeviceProp::clockRate, since the underlying clock can change dynamically, due to clock boosting and clock throttling. If I recall correctly, clock throttling to cap power consumption has been around since Fermi, and dynamic clock boost was introduced with Kepler.

Optimus Over a year ago

Yes, I agree with you. But in my case the values I got were accurate. I compared the results I got from clock() with the time measurements I got from CUDA events and they were almost similar.

Collectives™ on Stack Overflow

What is the clock measure by clock() and clock64() in CUDA?

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related