[Computed gotos](https://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables) make virtual machines faster at dispatching instructions.
Consider the "normal" way of dispatching via a looped `switch` statement:
```
while (true) {
switch (code[pc++]) { // dispatch next opcode
case OP_ADD:
add();
break;
case OP_SUB:
sub();
break;
case OP_MUL:
mul();
break;
case OP_DIV:
div();
break;
}
}
```
This iterates over a program counter within the user's bytecode and invokes arithmetic functions based on the observed opcode.
The `switch` statement is, ultimately, the single point of dispatch for all opcodes. That means that a CPU's branch predictor can't really determine a pattern to the next opcode. Consider that over the past few handled instructions within a user's bytecode, the branch predictor will have seen multiple different opcodes. Thus, the prediction for the next `switch` target cannot be focused onto a single opcode, which means that there will be no (correct) branch prediction!
Fortunately there is a non-standard extension to C/C++ that some compilers implement. *Computed gotos* allow programs to reference labels directly:
```
// opcodes must be in consecutive order for this array to be valid
void* table[] = [op_add, op_sub, op_mul, op_div];
// start the first dispatch
goto *table[code[pc++]];
op_add:
add();
goto *table[code[pc++]]; // dispatch next opcode
op_sub:
sub();
goto *table[code[pc++]]; // dispatch next opcode
op_mul:
mul();
goto *table[code[pc++]]; // dispatch next opcode
op_div:
div();
goto *table[code[pc++]]; // dispatch next opcode
```
Here there are multiple points of dispatch because each opcode handler dispatches to the next opcode in the user's bytecode. If the user's bytecode is part of a tight loop, then the branch predictor will see a pattern of the current opcode based solely on the previous opcode. That means prediction is more focused (the current opcode determines the next opcode), which means there will be a (likely correct) branch prediction!
This technique is sometimes referred to as [direct threading](https://en.wikipedia.org/wiki/Threaded_code#Direct_threading) (not to be confused with multithreading).
CPython's virtual machine will [use computed gotos](https://github.com/python/cpython/blob/main/Python/ceval_macros.h) if possible, noting:
> At the time of this writing, the "threaded code" version is up to 15-20% faster than the normal "switch" version, depending on the compiler and the CPU architecture.