might have gone a bit far with the microbenchmarking here, but I've stripped out the irrelevant parts of your above functions, giving:
def only_dict(d, i): d[i] = 0 return d[i] def without_walrus(d, i): r = 0 d[i] = r return r def with_walrus(d, i): d[i] = (r := 0) return r
i.e. just write the number zero into the dictionary instead of complicating things with also running sum(range(10)). note that as soon as your code is doing anything as complicated as sum(range(10)) (i.e. almost certainly) then the time of that will dominate and all of this doesn't matter
I've also written a special version which appears below as patched_walrus which is like with_walrus eliminates the store to r. it's similar to:
def patched_walrus(d, i): return (d[i] := 0)
AFAIK this can't be expressed in Python code, but the bytecode allows it and it was an interesting for me to include
it's important to reduce variance as much as possible, and because the code we're benchmarking is so small I'm using Timers directly as:
from timeit import Timer functions = [only_dict, without_walrus, with_walrus, patched_walrus] timers = [ Timer('fn(d, 0)', 'fn = func; d = {}', globals=dict(func=fn)) for fn in functions ] out = [ tuple(t.timeit() for t in timers) for _ in range(101) ]
I ignore the first few runs as these tend to be slower due to various things like warming up the cache, e.g. your first two runs are noticeably slower because of this. using Timer directly helps because it will compile the code once (rather than every time you call timeit and then the compiled code can remain hot in the cache.
next we can plot these in order:

which helps to see if your machine was busy as this can bias the results. I've drawn outliers as dots and connected the rest. the small plot on the right has KDEs of the non-outlier distributions.
we can see that:
only_dict is about 10 nanoseconds per invocation faster, i.e. a tiny difference but we can reliably measure it now without_walrus and with_walrus are still basically the same - my special
patched_walrus is a measurable 2 nanoseconds faster, but so fiddly to create it's almost certainly not worth it. you'd be better writing a CPython extension module directly if you really care about performance