Basically, the expression 0.4 * a is consistently, and surprisingly, significantly faster than a * 0.4. a being an integer. And I have no idea why.
I speculated that it is a case of a LOAD_CONST LOAD_FAST bytecode pair being "more specialized" than the LOAD_FAST LOAD_CONST and I would be entirely satisfied with this explanation, except that this quirk seems to apply only to multiplications where types of multiplied variables differ. (By the way, I can no longer find the link to this "bytecode instruction pair popularity ranking" I once found on github, does anyone have a link?)
Anyway, here are the micro benchmarks:
$ python3.10 -m pyperf timeit -s"a = 9" "a * 0.4" Mean +- std dev: 34.2 ns +- 0.2 ns $ python3.10 -m pyperf timeit -s"a = 9" "0.4 * a" Mean +- std dev: 30.8 ns +- 0.1 ns $ python3.10 -m pyperf timeit -s"a = 0.4" "a * 9" Mean +- std dev: 30.3 ns +- 0.3 ns $ python3.10 -m pyperf timeit -s"a = 0.4" "9 * a" Mean +- std dev: 33.6 ns +- 0.3 ns As you can see - in the runs where the float comes first (2nd and 3rd) - it is faster.
So my question is where does this behavior come from? I'm 90% sure that it is an implementation detail of CPython, but I'm not that familiar with low level instructions to state that for sure.
float.__add__immediately converts the integer to afloat, where asint.__add__raisesNotImplemented, forcingfloat.__radd__to be called.