TIL This varies (a lot). Here are some results using gnu compiler (btw I also checked by compiling on machines, gnu g++ 5.4 from xenial is a hell of a lot faster than 4.6.3 from linaro on precise)
Intel i7 4700MQ xenial
short add: 0.822491 short sub: 0.832757 short mul: 1.007533 short div: 3.459642 long add: 0.824088 long sub: 0.867495 long mul: 1.017164 long div: 5.662498 long long add: 0.873705 long long sub: 0.873177 long long mul: 1.019648 long long div: 5.657374 float add: 1.137084 float sub: 1.140690 float mul: 1.410767 float div: 2.093982 double add: 1.139156 double sub: 1.146221 double mul: 1.405541 double div: 2.093173
Intel i3 2370M has similar results
short add: 1.369983 short sub: 1.235122 short mul: 1.345993 short div: 4.198790 long add: 1.224552 long sub: 1.223314 long mul: 1.346309 long div: 7.275912 long long add: 1.235526 long long sub: 1.223865 long long mul: 1.346409 long long div: 7.271491 float add: 1.507352 float sub: 1.506573 float mul: 2.006751 float div: 2.762262 double add: 1.507561 double sub: 1.506817 double mul: 1.843164 double div: 2.877484
Intel(R) Celeron(R) 2955U (Acer C720 Chromebook running xenial)
short add: 1.999639 short sub: 1.919501 short mul: 2.292759 short div: 7.801453 long add: 1.987842 long sub: 1.933746 long mul: 2.292715 long div: 12.797286 long long add: 1.920429 long long sub: 1.987339 long long mul: 2.292952 long long div: 12.795385 float add: 2.580141 float sub: 2.579344 float mul: 3.152459 float div: 4.716983 double add: 2.579279 double sub: 2.579290 double mul: 3.152649 double div: 4.691226
DigitalOcean 1GB Droplet Intel(R) Xeon(R) CPU E5-2630L v2 (running trusty)
short add: 1.094323 short sub: 1.095886 short mul: 1.356369 short div: 4.256722 long add: 1.111328 long sub: 1.079420 long mul: 1.356105 long div: 7.422517 long long add: 1.057854 long long sub: 1.099414 long long mul: 1.368913 long long div: 7.424180 float add: 1.516550 float sub: 1.544005 float mul: 1.879592 float div: 2.798318 double add: 1.534624 double sub: 1.533405 double mul: 1.866442 double div: 2.777649
AMD Opteron(tm) Processor 4122 (precise)
short add: 3.396932 short sub: 3.530665 short mul: 3.524118 short div: 15.226630 long add: 3.522978 long sub: 3.439746 long mul: 5.051004 long div: 15.125845 long long add: 4.008773 long long sub: 4.138124 long long mul: 5.090263 long long div: 14.769520 float add: 6.357209 float sub: 6.393084 float mul: 6.303037 float div: 17.541792 double add: 6.415921 double sub: 6.342832 double mul: 6.321899 double div: 15.362536
This uses code from http://pastebin.com/Kx8WGUfg as benchmark-pc.c
g++ -fpermissive -O3 -o benchmark-pc benchmark-pc.c
I've run multiple passes, but this seems to be the case that general numbers are the same.
One notable exception seems to be ALU mul vs FPU mul. Addition and subtraction seem trivially different.
Here is the above in chart form (click for full size, lower is faster and preferable):

Update to accomodate @Peter Cordes
https://gist.github.com/Lewiscowles1986/90191c59c9aedf3d08bf0b129065cccc
i7 4700MQ Linux Ubuntu Xenial 64-bit (all patches to 2018-03-13 applied)
short add: 0.773049 short sub: 0.789793 short mul: 0.960152 short div: 3.273668 int add: 0.837695 int sub: 0.804066 int mul: 0.960840 int div: 3.281113 long add: 0.829946 long sub: 0.829168 long mul: 0.960717 long div: 5.363420 long long add: 0.828654 long long sub: 0.805897 long long mul: 0.964164 long long div: 5.359342 float add: 1.081649 float sub: 1.080351 float mul: 1.323401 float div: 1.984582 double add: 1.081079 double sub: 1.082572 double mul: 1.323857 double div: 1.968488
AMD Opteron(tm) Processor 4122 (precise, DreamHost shared-hosting)
short add: 1.235603 short sub: 1.235017 short mul: 1.280661 short div: 5.535520 int add: 1.233110 int sub: 1.232561 int mul: 1.280593 int div: 5.350998 long add: 1.281022 long sub: 1.251045 long mul: 1.834241 long div: 5.350325 long long add: 1.279738 long long sub: 1.249189 long long mul: 1.841852 long long div: 5.351960 float add: 2.307852 float sub: 2.305122 float mul: 2.298346 float div: 4.833562 double add: 2.305454 double sub: 2.307195 double mul: 2.302797 double div: 5.485736
Intel Xeon E5-2630L v2 @ 2.4GHz (Trusty 64-bit, DigitalOcean VPS)
short add: 1.040745 short sub: 0.998255 short mul: 1.240751 short div: 3.900671 int add: 1.054430 int sub: 1.000328 int mul: 1.250496 int div: 3.904415 long add: 0.995786 long sub: 1.021743 long mul: 1.335557 long div: 7.693886 long long add: 1.139643 long long sub: 1.103039 long long mul: 1.409939 long long div: 7.652080 float add: 1.572640 float sub: 1.532714 float mul: 1.864489 float div: 2.825330 double add: 1.535827 double sub: 1.535055 double mul: 1.881584 double div: 2.777245
Apple Mac Mini M1
short add: 0.794701 short sub: 0.752165 short mul: 1.002816 short div: 1.510412 long add: 0.704235 long sub: 0.704065 long mul: 0.891701 long div: 1.391481 long long add: 0.703971 long long sub: 0.704361 long long mul: 0.890722 long long div: 1.392378 float add: 1.376483 float sub: 1.377145 float mul: 1.377523 float div: 1.754344 double add: 1.378830 double sub: 1.380009 double mul: 1.378437 double div: 2.005511
Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz
short add: 0.625791 short sub: 0.612076 short mul: 0.808043 short div: 3.223206 long add: 0.598402 long sub: 0.594910 long mul: 0.783385 long div: 4.568725 long long add: 0.594657 long long sub: 0.597185 long long mul: 0.778999 long long div: 4.467567 float add: 0.972729 float sub: 0.963480 float mul: 0.968124 float div: 1.767378 double add: 0.973561 double sub: 0.968600 double mul: 0.976119 double div: 1.967776
Apple MacBook Air M2
short add: 0.761225 short sub: 0.738152 short mul: 0.832800 short div: 1.407643 long add: 0.278027 long sub: 0.278680 long mul: 0.469060 long div: 0.971469 long long add: 0.278614 long long sub: 0.277795 long long mul: 0.469232 long long div: 0.972268 float add: 1.378481 float sub: 1.389127 float mul: 1.392117 float div: 1.722389 double add: 1.386530 double sub: 1.395797 double mul: 1.389992 double div: 1.969165
addlreplaced withfadd, for example). The only way to really get a good measurement is get a core part of your real program and profile different versions of that. Unfortunately that can be pretty hard without using tons of effort. Perhaps telling us the target hardware and your compiler would help people at least give you pre-existing experience, etc. About your integer use, I suspect you could make a sort offixed_pointtemplate class that would ease such work tremendously.floatgets the speed boost, but usuallydoubledoesn't.