You do not need to go all the way down to assembly in order to get that speed. You can do direct port access from your .ino file:
void setup() { DDRA |= _BV(PA0); // pin 22 = PA0 as output for (;;) { PINA |= _BV(PA0); // toggle PA0 } } void loop(){} This compiles to something that is almost equivalent to your assembly code. Actually, it is a bit faster, as it uses rjmp instead of the slower jmp instruction.
Edit: A few notes
As pointed out by timemage in a comment, you can save another CPU cycle by writing
PINA =instead ofPINA |=.This code, as well as your two examples, will exhibit a glitch every 1,024 µs. This is caused by the periodic timer interrupt used by the Arduino core for timekeeping (
millis(),micros()anddelay()). You can avoid the glitch by disabling interrupts before going into the tight loop. Alternatively, if you do not use the Arduino core at all, you can define a function calledmain()instead ofsetup()andloop(): this will completely remove the Arduino core initialization for your compiled program.arduino-cliis useful for sparing you the complexity of the Arduino build system (automatic installation of the cores and libraries, libraries in multiple places that depend on the core you use...). If you do not use the Arduino core,arduino-cliis of little use: a very simple Makefile that callsavr-gccandavrdudeis all you need for basic AVR development.