My first thought was a 6809 as it is a true 8-bit microprocessor but you want "best performance" as measured in wall-clock time. A better choiceschoice would be a 68008 as it has 32-bit registers so it could easily handle 16x16 multiply and 32-bit add/subtracts.
I think of "best performance" in number of clock cycles to do the job. If you are only measuring "performance" in terms of the wall clock the faster the clock the less time it will take. You could put a 68008 core in to an FPGA and crank up the clock speed for the operation to take less time on the clock.
It could be argued that you don't have a real microprocessor if you just have a core in an FPGA but it would behave the same as the real thing but a whole lot faster.
If you broaden the definition of "microprocessor" to other devices that can have an 8-bit data bus then you open it up to the use of CPLDs, FPGAs, or DSPs.