Note: this is an attempt to answer the question as asked. This answer is unlikely to be of any use to the original poster, who presumably asked the wrong question. I am writing this only as a way to explore the limits on how fast a modest AVR can sample a port. For an answer that genuinely attempts to address the OP’s problem, see Majenko’s answer.
I read the question as follows: can we sample a digital port at 2 MHz on an Arduino Nano clocked at 8 MHz? Can we do so while storing the values in a RAM-based buffer?
The answer is yes, but it is non trivial, and it requires some assembly. To see the problem, let’s start by trying to do it in C++:
uint8_t buffer[1024]; void fill_buffer() { cli(); for (size_t i = 0; i < sizeof buffer; i++) buffer[i] = PINB; sei(); }
Note that the loop runs with interrupts disabled, otherwise the timer interrupt would wreak havoc with the loop timing. This is translated by gcc into an assembly equivalent to this:
cli ldi r30, lo8(buffer) ; load the buffer address into pointer Z ldi r31, hi8(buffer) ; ditto 0: in r24, 0x03 ; read the port st Z+, r24 ; store into buffer, increment the pointer ldi r24, hi8(buffer+1024) ; save (buffer+1024)>>8 in r24 cpi r30, lo8(buffer+1024) ; compare the pointer with buffer+1024 cpc r31, r24 ; ditto brne 0b ; loop back sei ret
The loop takes 8 cycles per iteration. With an 8 MHz clock, that would be one reading per microsecond. Too slow by a factor two.
One could save one cycle by using a different register for the port data and for the end-of-loop condition, and by moving the third ldi out of the loop. Another cycle could be saved by testing only the high byte of the Z pointer, but that would require aligning the buffer to 256 byte boundaries. With those two optimizations, we still need 6 CPU cycles per iteration, i.e. 0.75 µs at 8 MHz.
In order to make this faster, the only solution is to unroll the loop. This can be done in assembly by using the .rept (meaning “repeat”) directive:
void fill_buffer() { cli(); asm volatile( ".rept %[count]\n" // repeat (count) times: "in r0, %[pin]\n" // read the port input register "st Z+, r0\n" // store in RAM "nop\n" // 1 cycle delay ".endr" :: "z" (buffer), [count] "i" (sizeof buffer), [pin] "I" (_SFR_IO_ADDR(PINB)) : "r0" ); sei(); }
This takes 4 cycles, or 0.5 µs per iteration. Note that a delay cycle had to be introduced, otherwise the sampling would be too fast : 3 cycles, or 0.375 µs, per iteration.
This is not the fastest one can get. It is possible to take one sample per CPU cycle with something like this:
in r0, 0x03 in r1, 0x03 in r2, 0x03 ...
However this technique is limited to burst readings of at most 32 samples.
F_CPU/3, while storing the data in an array, but: 1. This needs assembly programming and, 2. It is very likely not what you want to achieve.