8

Basically, the expression 0.4 * a is consistently, and surprisingly, significantly faster than a * 0.4. a being an integer. And I have no idea why.

I speculated that it is a case of a LOAD_CONST LOAD_FAST bytecode pair being "more specialized" than the LOAD_FAST LOAD_CONST and I would be entirely satisfied with this explanation, except that this quirk seems to apply only to multiplications where types of multiplied variables differ. (By the way, I can no longer find the link to this "bytecode instruction pair popularity ranking" I once found on github, does anyone have a link?)

Anyway, here are the micro benchmarks:

$ python3.10 -m pyperf timeit -s"a = 9" "a * 0.4" Mean +- std dev: 34.2 ns +- 0.2 ns 
$ python3.10 -m pyperf timeit -s"a = 9" "0.4 * a" Mean +- std dev: 30.8 ns +- 0.1 ns 
$ python3.10 -m pyperf timeit -s"a = 0.4" "a * 9" Mean +- std dev: 30.3 ns +- 0.3 ns 
$ python3.10 -m pyperf timeit -s"a = 0.4" "9 * a" Mean +- std dev: 33.6 ns +- 0.3 ns 

As you can see - in the runs where the float comes first (2nd and 3rd) - it is faster.
So my question is where does this behavior come from? I'm 90% sure that it is an implementation detail of CPython, but I'm not that familiar with low level instructions to state that for sure.

1
  • 1
    My guess: float.__add__ immediately converts the integer to a float, where as int.__add__ raises NotImplemented, forcing float.__radd__ to be called. Commented Aug 11, 2022 at 20:54

1 Answer 1

9

It's CPython's implementation of the BINARY_MULTIPLY opcode. It has no idea what the types are at compile-time, so everything has to be figured out at run-time. Regardless of what a and b may be, BINARY_MULTIPLY ends up inoking a.__mul__(b).

When a is of int type int.__mul__(a, b) has no idea what to do unless b is also of int type. It returns Py_RETURN_NOTIMPLEMENTED (an internal C constant). This is in longobject.c's CHECK_BINOP macro. The interpreter sess that, and effectively says "OK, a.__mul__ has no idea what to do, so let's give b.__rmul__ a shot at it". None of that is free - it all takes time.

float.__mul__(b, a) (same as float.__rmul__) does know what to do with an int (converts it to float first), so that succeeds.

But when a is of float type to begin with, we go to float.__mul__ first, and that's the end of it. No time burned figuring out that the int type doesn't know what to do.

The actual code is quite a bit more involved than the above pretends, but that's the gist of it.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.