2
uint64_t getTimeLatencyNs() { struct timespec ts1; struct timespec ts2; clock_gettime(CLOCK_MONOTONIC_RAW, &ts1); clock_gettime(CLOCK_MONOTONIC_RAW, &ts2); return ((ts2.tv_sec - ts1.tv_sec) * NSEC + ts2.tv_nsec - ts1.tv_nsec); } 

The code above and existing questions show we can call clock_gettime() every few nanoseconds and it returns a different value.

The usual explanation is: due to the vDSO, there's no syscall involved and that's why it's quick. To my understanding, the vDSO only eliminates the (significant) overhead stemming from the syscall. But there's more happening in the background.

I want to understand what exactly happens upon calling clock_gettime(), to be able to reason about the temporal behavior of the function call. Assume I already understand (or are able to google) adjacent concepts like time synchronization (PTP, NTP), RTC or different clocks (e.g. CLOCK_MONOTONIC).

Jiffies (how it always worked)

I found a description of the Jiffies timing mechanism by Ingo Molnar1 from 2006. In summary, there's a value stored in memory, which is incremented every 1/CONFIG_HZ seconds. Calls to gettimeofday()/clock_gettime() simply read the memory address. On typical systems, the update happens every 1 or 4 milliseconds (i.e. clock resolution was a few milliseconds), see man time. But I can observe a nanosecond resolution in the example above. So Jiffies is not how it works on my system.

hrtimer (feature since Kernel 2.x)

Since Linux 2.6.21, CONFIG_HIGH_RES_TIMERS enables hrtimers to achieve a higher resolution, according to man time. On my system the config is set to yes. So I read up on hrtimer.

Thomas Gleixner is often credited for the high-resolution kernel timer subsystem. But hrtimer docs say they are not used for clocks. They say they plan to implement it, though. Currently, I'm still searching for an hrtimer skimming the clock_gettime() kernel source.

While this subsystem does not offer high-resolution clock sources just yet, the hrtimer subsystem can be easily extended with high-resolution clock capabilities,

Baeldung (explanation from 2024)

Searching more, the Baeldung article "Understanding Timekeeping and Clocks in Linux" from 2024 explains vDSO as enabling to call clock_gettime() without the syscall overhead, by memory mapping the function from kernel space into user space. It also has an interesting quote:

the vDSO can introduce latency spikes when the kernel updates shared memory areas with clock counters. This situation is more likely to occur with clocks accelerated by the vDSO.

However, their talking of latency spikes caused by updates to shared memory sounds to me like they are talking about Jiffies. Similarly, this answer claims clock_gettime() outliers to be caused by updates to shared memory, hitting a specific do ... while(unlikely...) case. I'm uncertain about the explanation, because the average call in that question take some nanoseconds, while the outliers lie in the microsecond range. Another run of that loop should only take nanoseconds as well.

TSC register (why hardware is precise)

The question How is the microsecond time of linux gettimeofday() obtained and what is its accuracy? got an answer which says, on recent hardware clock_gettime() reads from the constantly increasing TSC register, amdn also explains why it's constantly increasing. I assume that on my machine this is used in the end to achieve the high resolution, because it's a relatively recent x86 machine. There are similar registers on other platforms, e.g. arm's CCNT, but probably shared code is wrapped around the register access anyway.

There's a question from 2011 that mentions "hpet", but Wikipedia says it has some problems and is not used for clocks anymore, since TSC is running at constant speed in today's processors.

Blog (detailed explanation from 2013)

The most helpful to me is this explanation of the clock_gettime kernel sources from 2013.

From what I gather, there's still a shared memory location used when hrtimer are enabled. Depending on the kernel config, shared memory is either updated regularly via Jiffies (CONFIG_HZ) or via a timer interrupt (CONFIG_NO_HZ). Then vgetsns() is called to increase granularity, which I assume means it reads aforementioned TSC register. On my system NO_HZ is set, but HZ=1000 as well, so I don't fully understand it currently.

$ cat /boot/config | grep _HZ CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set # CONFIG_NO_HZ_IDLE is not set CONFIG_NO_HZ_FULL=y CONFIG_NO_HZ=y # CONFIG_HZ_100 is not set # CONFIG_HZ_250 is not set # CONFIG_HZ_300 is not set CONFIG_HZ_1000=y CONFIG_HZ=1000 

Summary

As far as I understand, a call to clock_gettime() normally takes only a few nanoseconds, because of the vDSO mechanism. Still, sometimes there are outliers, which prompted a few questions on the behavior of clock_gettime() and its timing. The answers to those questions explain certain aspects of it and sometimes contradict each other. Hardware-wise the high resolution is achieved by making the TSC register reliably count clock cycles. I couldn't find a recent explanation, of how clock_gettime works conceptually. Before digging into kernel source and/or ftrace, I thought it fair time to ask the community.

So how does clock_gettime() actually work, especially regarding the timing and happenings "behind" the vDSO "curtain"?


Correcting any wrong assumptions of mine is highly welcome. If needed, I can draw a picture of my understanding.

1 In best manner for kernel mailing list, it starts with the words "[Previous mail is] Completely wrong!"

5
  • 3
    Hi, so there is a lot of text and blogs and "opinions" and explanations, but no source code. If you want what exactly happens upon calling clock_gettime() why not start reading the source code of clock_gettime? Commented May 22 at 18:07
  • 2
    Copy the snippet from the kernel and it'll be a lot easier to (reason about and) answer the specific questions you have about it. Commented May 22 at 18:09
  • 1
    Note the difference between ((ts2.tv_sec - ts1.tv_sec) * NSEC + ts2.tv_nsec - ts1.tv_nsec) and ((ts2.tv_sec - ts1.tv_sec) * NSEC + (ts2.tv_nsec - ts1.tv_nsec)). IMO, the 2nd is better. Commented May 22 at 18:30
  • @KamilCuk I'm doin that currently. But as there seems to be no explanation apart from the actual code, I thought it could be useful to others with the same question to add a question to stackoverflow. Commented May 22 at 18:54
  • @TedLyngmo You're right, probably got a little carried away with the research. Any suggestions what adds the least value and can be removed? Commented May 22 at 18:58

1 Answer 1

4

Your code calls clock_gettime():

Now the stuff from __arch_get_vdso_data() is where vDSO really happens - the __vdso_clock_gettime64 does not itself "use" time source. It only reads preset values set in vdso_data. The values are set in update_vsyscall() https://elixir.bootlin.com/linux/v6.14.7/source/kernel/time/vsyscall.c#L78 , which is called from timekeeping https://elixir.bootlin.com/linux/v6.14.7/source/kernel/time/timekeeping.c#L687 .

Sign up to request clarification or add additional context in comments.

10 Comments

Yes, this is actually an answer to a question I thought wasn't answerable in its current state. Superb digging!
Minor: Isn't __arch_get_vdso_data() used by __cvdso_clock_gettime(), one bullet point lower?
@Mo_ You need to take your queries into a more specific question. You've gotten an answer to the wide question you asked and someone answered in kind. Be good with that.
Why does magic happen there? Sorry, I mean the magic of vdso as I understand. The linux-vdso.so does not read actual hardware registers, instead kernel periodically copies time "state" into the share library for user space to use. the architecture specific vdso_data I do not think it's that much architecture specific elixir.bootlin.com/linux/v6.14.7/source/include/vdso/…
Most of the actual "action" happens in gettimeofday.c, correct? I do think I follow how gettimeofday is related to click_gettime. Let gettimeofday die, it's been deprecated for decades. I think most of the code looks to be in do_hres loop elixir.bootlin.com/linux/v6.14.7/source/lib/vdso/… . I think the rest is my understanding. Really you could plug in a debugger and measure how long each instruction takes, all happens in userspace.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.