Revision c3e2bb3e-506c-43ff-9c59-ac6f587b785c - Programming Language Design and Implementation Stack Exchange

[Computed gotos](https://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables) make virtual machines faster at dispatching instructions.

Consider the "normal" way of dispatching via a looped `switch` statement:

```
while (true) {
 switch (code[pc++]) { // dispatch next opcode
 case OP_ADD:
 add();
 break;

 case OP_SUB:
 sub();
 break;

 case OP_MUL:
 mul();
 break;

 case OP_DIV:
 div();
 break;
 }
}
```

This iterates over a program counter within the user's bytecode and invokes arithmetic functions based on the observed opcode.

The `switch` statement is, ultimately, the single point of dispatch for all opcodes. That means that a CPU's branch predictor can't really determine a pattern to the next opcode. Consider that over the past few handled instructions within a user's bytecode, the branch predictor will have seen multiple different opcodes. Thus, the prediction for the next `switch` target cannot be focused onto a single opcode, which means that there will be no (correct) branch prediction!

Fortunately there is a non-standard extension to C/C++ that some compilers implement. *Computed gotos* allow programs to reference labels directly:

```
// opcodes must be in consecutive order for this array to be valid
void* table[] = [op_add, op_sub, op_mul, op_div];

// start the first dispatch
goto *table[code[pc++]];

op_add:
 add();
 goto *table[code[pc++]]; // dispatch next opcode

op_sub:
 sub();
 goto *table[code[pc++]]; // dispatch next opcode

op_mul:
 mul();
 goto *table[code[pc++]]; // dispatch next opcode

op_div:
 div();
 goto *table[code[pc++]]; // dispatch next opcode
```

Here there are multiple points of dispatch because each opcode handler dispatches to the next opcode in the user's bytecode. If the user's bytecode is part of a tight loop, then the branch predictor will see a pattern of the current opcode based solely on the previous opcode. That means prediction is more focused (the current opcode determines the next opcode), which means there will be a (likely correct) branch prediction!

This technique is sometimes referred to as [direct threading](https://en.wikipedia.org/wiki/Threaded_code#Direct_threading) (not to be confused with multithreading).

CPython's virtual machine will [use computed gotos](https://github.com/python/cpython/blob/main/Python/ceval_macros.h) if possible, noting:

> At the time of this writing, the "threaded code" version is up to 15-20% faster than the normal "switch" version, depending on the compiler and the CPU architecture.