How does clock_gettime actually work in the kernel?

Question

uint64_t getTimeLatencyNs() { struct timespec ts1; struct timespec ts2; clock_gettime(CLOCK_MONOTONIC_RAW, &ts1); clock_gettime(CLOCK_MONOTONIC_RAW, &ts2); return ((ts2.tv_sec - ts1.tv_sec) * NSEC + ts2.tv_nsec - ts1.tv_nsec); }

The code above and existing questions show we can call clock_gettime() every few nanoseconds and it returns a different value.

The usual explanation is: due to the vDSO, there's no syscall involved and that's why it's quick. To my understanding, the vDSO only eliminates the (significant) overhead stemming from the syscall. But there's more happening in the background.

I want to understand what exactly happens upon calling clock_gettime(), to be able to reason about the temporal behavior of the function call. Assume I already understand (or are able to google) adjacent concepts like time synchronization (PTP, NTP), RTC or different clocks (e.g. CLOCK_MONOTONIC).

Jiffies (how it always worked)

I found a description of the Jiffies timing mechanism by Ingo Molnar¹ from 2006. In summary, there's a value stored in memory, which is incremented every 1/CONFIG_HZ seconds. Calls to gettimeofday()/clock_gettime() simply read the memory address. On typical systems, the update happens every 1 or 4 milliseconds (i.e. clock resolution was a few milliseconds), see man time. But I can observe a nanosecond resolution in the example above. So Jiffies is not how it works on my system.

hrtimer (feature since Kernel 2.x)

Since Linux 2.6.21, CONFIG_HIGH_RES_TIMERS enables hrtimers to achieve a higher resolution, according to man time. On my system the config is set to yes. So I read up on hrtimer.

Thomas Gleixner is often credited for the high-resolution kernel timer subsystem. But hrtimer docs say they are not used for clocks. They say they plan to implement it, though. Currently, I'm still searching for an hrtimer skimming the clock_gettime() kernel source.

While this subsystem does not offer high-resolution clock sources just yet, the hrtimer subsystem can be easily extended with high-resolution clock capabilities,

Baeldung (explanation from 2024)

Searching more, the Baeldung article "Understanding Timekeeping and Clocks in Linux" from 2024 explains vDSO as enabling to call clock_gettime() without the syscall overhead, by memory mapping the function from kernel space into user space. It also has an interesting quote:

the vDSO can introduce latency spikes when the kernel updates shared memory areas with clock counters. This situation is more likely to occur with clocks accelerated by the vDSO.

However, their talking of latency spikes caused by updates to shared memory sounds to me like they are talking about Jiffies. Similarly, this answer claims clock_gettime() outliers to be caused by updates to shared memory, hitting a specific do ... while(unlikely...) case. I'm uncertain about the explanation, because the average call in that question take some nanoseconds, while the outliers lie in the microsecond range. Another run of that loop should only take nanoseconds as well.

TSC register (why hardware is precise)

The question How is the microsecond time of linux gettimeofday() obtained and what is its accuracy? got an answer which says, on recent hardware clock_gettime() reads from the constantly increasing TSC register, amdn also explains why it's constantly increasing. I assume that on my machine this is used in the end to achieve the high resolution, because it's a relatively recent x86 machine. There are similar registers on other platforms, e.g. arm's CCNT, but probably shared code is wrapped around the register access anyway.

There's a question from 2011 that mentions "hpet", but Wikipedia says it has some problems and is not used for clocks anymore, since TSC is running at constant speed in today's processors.

Blog (detailed explanation from 2013)

The most helpful to me is this explanation of the clock_gettime kernel sources from 2013.

From what I gather, there's still a shared memory location used when hrtimer are enabled. Depending on the kernel config, shared memory is either updated regularly via Jiffies (CONFIG_HZ) or via a timer interrupt (CONFIG_NO_HZ). Then vgetsns() is called to increase granularity, which I assume means it reads aforementioned TSC register. On my system NO_HZ is set, but HZ=1000 as well, so I don't fully understand it currently.

$ cat /boot/config | grep _HZ CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set # CONFIG_NO_HZ_IDLE is not set CONFIG_NO_HZ_FULL=y CONFIG_NO_HZ=y # CONFIG_HZ_100 is not set # CONFIG_HZ_250 is not set # CONFIG_HZ_300 is not set CONFIG_HZ_1000=y CONFIG_HZ=1000

Summary

As far as I understand, a call to clock_gettime() normally takes only a few nanoseconds, because of the vDSO mechanism. Still, sometimes there are outliers, which prompted a few questions on the behavior of clock_gettime() and its timing. The answers to those questions explain certain aspects of it and sometimes contradict each other. Hardware-wise the high resolution is achieved by making the TSC register reliably count clock cycles. I couldn't find a recent explanation, of how clock_gettime works conceptually. Before digging into kernel source and/or ftrace, I thought it fair time to ask the community.

So how does clock_gettime() actually work, especially regarding the timing and happenings "behind" the vDSO "curtain"?

Correcting any wrong assumptions of mine is highly welcome. If needed, I can draw a picture of my understanding.

_{¹ In best manner for kernel mailing list, it starts with the words "[Previous mail is] Completely wrong!"}

Hi, so there is a lot of text and blogs and "opinions" and explanations, but no source code. If you want what exactly happens upon calling clock_gettime() why not start reading the source code of clock_gettime? — KamilCuk
– KamilCuk, Commented May 22 at 18:07
Copy the snippet from the kernel and it'll be a lot easier to (reason about and) answer the specific questions you have about it. — Ted Lyngmo
– Ted Lyngmo, Commented May 22 at 18:09
Note the difference between ((ts2.tv_sec - ts1.tv_sec) * NSEC + ts2.tv_nsec - ts1.tv_nsec) and ((ts2.tv_sec - ts1.tv_sec) * NSEC + (ts2.tv_nsec - ts1.tv_nsec)). IMO, the 2nd is better. — chux
– chux, Commented May 22 at 18:30
@KamilCuk I'm doin that currently. But as there seems to be no explanation apart from the actual code, I thought it could be useful to others with the same question to add a question to stackoverflow. — Mo_
– Mo_, Commented May 22 at 18:54
@TedLyngmo You're right, probably got a little carried away with the research. Any suggestions what adds the least value and can be removed? — Mo_
– Mo_, Commented May 22 at 18:58

KamilCuk · Accepted Answer · 2025-05-22 22:50:34Z

Your code calls clock_gettime():

which is defined at https://codebrowser.dev/glibc/glibc/sysdeps/unix/sysv/linux/clock_gettime.c.html#113
which calls _dl_vdso_clock_gettime64 function https://codebrowser.dev/glibc/glibc/sysdeps/unix/sysv/linux/clock_gettime.c.html#38
which is set up in dl-vdso-setup.h https://codebrowser.dev/glibc/glibc/sysdeps/unix/sysv/linux/dl-vdso-setup.h.html#33
- resolved using dl_vdso_sym() https://codebrowser.dev/glibc/glibc/sysdeps/unix/sysv/linux/dl-vdso.h.html#dl_vdso_vsym
- with macro HAVE_CLOCK_GETTIME64_VSYSCALL "__vdso_clock_gettime" https://codebrowser.dev/glibc/glibc/sysdeps/unix/sysv/linux/x86_64/sysdep.h.html#374
- initailized in _dl_non_dynamic_init() https://codebrowser.dev/glibc/glibc/elf/dl-support.c.html#282
- called from _init_first() with __attribute__((constructor)) https://codebrowser.dev/glibc/glibc/csu/init-first.c.html#68
we end up in __vdso_clock_gettime64() in the userspace library to the kernel exposed data https://elixir.bootlin.com/linux/v6.14.7/source/arch/x86/entry/vdso/vclock_gettime.c#L62
- which uses __arch_get_vdso_data() https://elixir.bootlin.com/linux/v6.14.7/source/arch/x86/include/asm/vdso/gettimeofday.h#L278
which calls __cvdso_clock_gettime() https://elixir.bootlin.com/linux/v6.14.7/source/lib/vdso/gettimeofday.c#L283
calls __cvdso_clock_gettime_data() https://elixir.bootlin.com/linux/v6.14.7/source/lib/vdso/gettimeofday.c#L272 (at this point it's just clicking in elixir)
calls __cvdso_clock_gettime_common() https://elixir.bootlin.com/linux/v6.14.7/source/lib/vdso/gettimeofday.c#L245
calls do_hres() https://elixir.bootlin.com/linux/v6.14.7/source/lib/vdso/gettimeofday.c#L130
calls do_hres_timens() https://elixir.bootlin.com/linux/v6.14.7/source/lib/vdso/gettimeofday.c#L72 if a process is using a time namespace https://man7.org/linux/man-pages/man7/time_namespaces.7.html , and either way do_hres or do_hres_timens call:
which calls __arch_get_hw_counter https://elixir.bootlin.com/linux/v6.14.7/source/arch/x86/include/asm/vdso/gettimeofday.h#L255
which finally calls rdtsc_ordered() that reads the TSC counter https://elixir.bootlin.com/linux/v6.14.7/source/arch/x86/include/asm/msr.h#L217

Now the stuff from __arch_get_vdso_data() is where vDSO really happens - the __vdso_clock_gettime64 does not itself "use" time source. It only reads preset values set in vdso_data. The values are set in update_vsyscall() https://elixir.bootlin.com/linux/v6.14.7/source/kernel/time/vsyscall.c#L78 , which is called from timekeeping https://elixir.bootlin.com/linux/v6.14.7/source/kernel/time/timekeeping.c#L687 .

Yes, this is actually an answer to a question I thought wasn't answerable in its current state. Superb digging!
Minor: Isn't __arch_get_vdso_data() used by __cvdso_clock_gettime(), one bullet point lower?
@Mo_ You need to take your queries into a more specific question. You've gotten an answer to the wide question you asked and someone answered in kind. Be good with that.
Why does magic happen there? Sorry, I mean the magic of vdso as I understand. The linux-vdso.so does not read actual hardware registers, instead kernel periodically copies time "state" into the share library for user space to use. the architecture specific vdso_data I do not think it's that much architecture specific elixir.bootlin.com/linux/v6.14.7/source/include/vdso/…
Most of the actual "action" happens in gettimeofday.c, correct? I do think I follow how gettimeofday is related to click_gettime. Let gettimeofday die, it's been deprecated for decades. I think most of the code looks to be in do_hres loop elixir.bootlin.com/linux/v6.14.7/source/lib/vdso/… . I think the rest is my understanding. Really you could plug in a debugger and measure how long each instruction takes, all happens in userspace.

Collectives™ on Stack Overflow

How does clock_gettime actually work in the kernel?

Jiffies (how it always worked)

hrtimer (feature since Kernel 2.x)

Baeldung (explanation from 2024)

TSC register (why hardware is precise)

Blog (detailed explanation from 2013)

Summary

1 Answer 1

10 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Jiffies (how it always worked)

hrtimer (feature since Kernel 2.x)

Baeldung (explanation from 2024)

TSC register (why hardware is precise)

Blog (detailed explanation from 2013)

Summary

1 Answer 1

10 Comments

Linked

Related