Why can consecutive calls to perf_counter_ns on windows return the same number?

Question

Using the below code I would have assumed so see cnt be close to 0, but on Windows I see only values above 500,000.

from time import * def test_time(f, c): cnt = 0 for i in range(c): ps, ts = f(), f() if not ps - ts: cnt += 1 return cnt if __name__ == '__main__': res = test_time(perf_counter_ns, 1_000_000) print(res) # usually returns a count of over 500k

On Linux this does not happen. I understand that the output resolution on Windows is limited to 100 ns increments. My question if I am missing something here or if there is a way this can be made to work on Windows.

EDIT Others might find High-precision clock in Python helpful as suggested by @JonSG. It gives a good overview of precision time measurement with Python, but does not address the narrower question why consecutive calls to perf_counter_ns can yield the same value on windows and not on Linux.

EDIT I've tested this behaviour with 3.11, 3.12 and 3.13.

EDIT I've tested restricting the python process on Windows to a single core or altering the process priority. Neither made an apparent impact on the number of collisions.

This question is similar to: High-precision clock in Python. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. — JonSG
– JonSG, Commented Aug 18 at 13:39
Out of curiosity: Why do you want to make this work? What would you use it for? — Kelly Bundy
– Kelly Bundy, Commented Aug 18 at 14:31
@JonSG I had read that question, and while the question itself is focussed on POSIX some of the answers therein address Windows, but do not answer my question why consecutive calls to time.perf_counter_ns() can yield the same value on windows and not on Linux. One of the answers in the question you linked shows that windows has a flat 100 ns timing for perf_counter_ns and Linux has ~70 ns. Surely those 30 ns aren't the reason why Linux doesn't have value collision on consecutive calls and Windows does? — culicidae
– culicidae, Commented Aug 19 at 9:07
Ok, makes sense for that use case. About "Surely those 30 ns aren't the reason": why not? If the time it takes your computer to execute one f() is let's say 80 ns, then if the system's value changes every 70 ns, you will never have duplicates, and if it changes every 100 ns, you will have many duplicates. Two measurements 80 ns apart can easily fall within the same 100 ns window but not within the same 70 ns window. — Kelly Bundy
– Kelly Bundy, Commented Aug 19 at 10:49
Unfortunately I only have access to one of the better Linux boxes. There I got no collisions, min is 30 ns, max is 193774 ns with 10044 vals >= 1000 ns and a cluster around 1400 ns. For Windows (on a very strong box) I got 647733 collisions, so min is 0 ns, max is 1691700 ns with 1160 vals >= 1000 ns. The Windows values are in 100 ns increments, like I said in the post and for Linux all 10 digits appear in the last position of the linux vals (i.e. the val seems to be ns precise reported). — culicidae
– culicidae, Commented Aug 19 at 21:39

ZeSeb · Accepted Answer · 2025-08-22 02:11:40Z

Need to dig a bit through the various docs:

Here is the clock used by python https://docs.python.org/3/library/time.html#time.monotonic

First, based on the clock frequency of the computer, there is a limit to the precision that you can achieve https://learn.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps#low-level-hardware-clock-characteristics (see example 1)

This explains also how they get their time https://docs.python.org/3/library/time.html#time.sleep and https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/high-resolution-timers

In all cases, this is through system interrupts.

Linux sends all the IRQs to a single core: http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux and has a single IRQ entry for syscalls https://linux-kernel-labs.github.io/refs/heads/master/lectures/interrupts.html so IRQs are likely to be handled in sequence.

For windows we don't have much information but I could find this https://codemachine.com/articles/interrupt_dispatching.html# which shows that there can be at least as many as 7 software IRQ entries. If they are dispatched to several cores and since cores share the same clock, this could explain your situation. How IRQs are dispatched to cores is up to the OS, not to the application, so even if your application is single threaded, as long as there are several cores available, IRQs could be dispatched to several of them.

To test this hypothesis, you can try disabling your CPU cores and multithreading to keep only one active core. If your CPU has a mix of high performance and efficient cores, disable the high performance cores and keep only one efficient core. High performance cores have multithreading and efficient ones don't. Depending on your machine maybe this can help: https://www.asus.com/support/faq/1054283/#:~:text=off%20E%2Dcore)-,1.,to%20turn%20off%20E%2DCore or this https://www.intel.com/content/www/us/en/support/articles/000056742/processors.html#primary-content

If you have an old AMD processor, it may not support multithreading so that could be easier to test. Just disable all cores but one.

Contrary to what I said before, trying a VM running on a single core may not work because a VM remains a software and thus may still have its IRQ handled by multiple cores.

I guess I didn't say this explicitly, but this part of my application runs single threaded. I've tested this behaviour with 3.11, 3.12 and 3.13. Since 3.10 the clock used on Windows is the same for all processes see docs.python.org/3/library/time.html#time.perf_counter. The clock for time.monotonic is only used for perf_counter since 3.13 (see same link) and it does not seem to make an appreciable difference. As for unpacking making a difference, it's 'just' syntactic sugar, but for sake of completeness I tried the 2-liner variant and got the same result.
ok, I did a bit more digging (this is an interesting topic!) and clarified my answer quite a bit. Also, the fact that your application is multithreaded or not does not matter for IRQs.
I've tested this via Task Manager -> details -> set affinity for the python process to use only one core. That seemed like an easier way to test this. In any case it did not change the outcome. I used this chance to also test process priority (lowest and realtime) with no apparent impact on the number of collisions.

Collectives™ on Stack Overflow

Why can consecutive calls to perf_counter_ns on windows return the same number?

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related