Why does the 6502 read from the stack before writing it?

Question

I am building a W65C02S based computer for fun, and I try to build basically all the tools myself. Including the assembler (yes, I know they exist, I just want to make everything from scratch myself, for fun).

When working on the JSR opcode (absolute addressing mode), I find that before it puts the program counter on the stack, it reads one byte off the stack first (program counter is set to 80 2e at this point, the stack pointer is 01b6):

READ address: 80 2e data: 20 JSR ; Read the opcode READ address: 80 2f data: 14 14 ; Read the LO byte of operand READ address: 01 b6 data: 74 't' ; Read the stack (why?!) WRITE address: 01 b6 data: 80 --- ; Write HI byte of PC WRITE address: 01 b5 data: 30 '0' ; Write LO byte of PC READ address: 80 30 data: 80 80 ; Read HI byte of operand READ address: 80 14 data: 8d STA ; Read opcode at `80 14` (the JSR operand)

It works OK, but what it the reason it reads from the stack first?

(To explain the table: this is from my own tool. I am using an Arduino to look at the data and address buses to see what is happening there. It is a hardware CPU running only my own code.)

Maybe it is because your tracker is reading the data and address bus to detect and save any change. By this way it may read data during the transition of the control bus (R/W for instance).since your tracker board is faster than 6502 try to add timestamps or save also the control bus status. — Paul Ghobril
– Paul Ghobril, Commented Feb 13, 2021 at 20:57
@PaulGhobril the tracker reads on every clock pulse. And this happens only at this point in the code. — Bart Friederichs
– Bart Friederichs, Commented Feb 13, 2021 at 21:11
The WDC instruction timing chart confirms your observation and labels it an internal operation (and mentions the original 6502 is in a different order). Since it has to read or write something every cycle, the easiest option would be PC+1 again, or S. — Kelvin Sherlock
– Kelvin Sherlock, Commented Feb 13, 2021 at 21:12
@Raffzahn o, it will be half ass. It will only implement the operations I will be needing. — Bart Friederichs
– Bart Friederichs, Commented Feb 13, 2021 at 23:12
@Jean-FrançoisFabre no. It has to stop somewhere. But for good measure, I use the highest level language I know: Python. I did build the first steps in a hex editor though. — Bart Friederichs
– Bart Friederichs, Commented Feb 13, 2021 at 23:14

Raffzahn · Accepted Answer · 2021-02-13 22:15:42Z

[This question is kind of duplicate to

TL;DR:

It's one of the 'dead' or internal cycles of a 6500. On a WDC65C02 they are externally all turned into 'harmless' read cycles.

Background

Unlike other CPUs the 6500 series has not pure internal cycles without memory access, but each and every cycle will as well be a memory cycle. Whenever internal operation can not overlap with with a needed memory cycle, a dummy access is inserted. Usually with the last address used.

On NMOS it depends on what the next (intended) cycle would be, read or write.
On CMOS this was changed that the internal cycle would always be a read.

This behaviour will be as well seen on RMW instructions and all inserted carry processing cycles on page crossings.

The Finer Details

The detailed working are described on page 107 of the 1976 MCS6500 Microcomputer Family Programming Manual:

Notable here is that cycle 3 stores the just read ADL, the low byte of the target address, which then gets overwritten in cycle 4 by PCL. This is of course like the NMOS version works.

As mentioned, when the CMOS version was designed one goal was to remove all unintended write operations (*1). In this case the third cycle writing ADL was replaced by a dummy read.

P.S.: If you're starting from scratch, it might be a good start to first read both 1976 manuals end to end :)

*1 - While it doesn't really matter when looking at RAM, unintended writes may screw the workings of I/O ports. The 6500 being mainly meant as an embedded CPU family this counts an important improvement.

The unintended writes on read-modify-write instructions were never a problem. What had sometimes been problematic but was sometimes relied upon were unintended reads. Some code to write the Apple II floppy drive relied upon the instruction after a STA abs,X instruction starting two cycles after the first access to $C0XF, but some relied upon the instruction after a STA abs executing on the cycle following the access. Subsequent stores need to put data on the bus at multiples of exactly 4 cycles after the first access to $C0XF. — supercat
– supercat, Commented Sep 30 at 18:20
Incidentally, I suspect that the reason some read-modify-write instructon timings were altered on the 65C02 and others weren't is that some programs relied upon the ability of read-modify-write operations to perform four consecutive accesses to an arbitrary address, without caring about the value written. The easiest way to interface a typical LCD controller to a 2Mhz 6502, for example, would ignore the data bus, and use the low-order address bits to feed data to the LCD controller. The controller requires its data to be held a long time, but the address pins on the 6502 would be stable... — supercat
– supercat, Commented Oct 1 at 16:54

cyco130 · Accepted Answer · 2025-09-30 13:04:57Z

Cycle #2 (I'll use zero-based cycle numbers, so it corresponds to cycle #3 in your chart) of JSR is a throwaway read because the 6502 uses a very weird trick to save some die space on the chip.

Here you can see how the internal state of the CPU changes on each half-cycle while it's executing a JSR $1234 instruction according to the transistor-level emulation of Visual 6502: http://www.visual6502.org/JSSim/expert.html?graphics=f&loglevel=0&a=0000&d=203412&steps=14&logmore=adh,adl,alu

cycle ab db rw Fetch pc a x y s p adh adl alu 0 0000 20 1 JSR Abs 0000 aa 00 00 fd nv-BdIZc 00 00 00 0000 20 1 JSR Abs 0000 aa 00 00 fd nv-BdIZc 01 ff ff 1 0001 34 1 0001 aa 00 00 fd nv-BdIZc 00 01 ff 0001 34 1 0001 aa 00 00 fd nv-BdIZc 01 fd fe 2 01fd 00 1 0002 aa 00 00 34 nv-BdIZc 01 fd fe 01fd 00 1 0002 aa 00 00 34 nv-BdIZc ff fd fd 3 01fd 00 0 0002 aa 00 00 34 nv-BdIZc ff fd fd 01fd 00 0 0002 aa 00 00 34 nv-BdIZc ff fc fc 4 01fc 00 0 0002 aa 00 00 34 nv-BdIZc ff fc fc 01fc 02 0 0002 aa 00 00 34 nv-BdIZc 00 02 fb 5 0002 12 1 0002 aa 00 00 34 nv-BdIZc 00 02 fb 0002 12 1 0002 aa 00 00 34 nv-BdIZc ff 34 fb 6 1234 00 1 BRK 1234 aa 00 00 fb nv-BdIZc 12 34 fb 1234 00 1 BRK 1234 aa 00 00 fb nv-BdIZc 12 35 12

As you can see, on the second half of cycle #1, it puts the value in the stack register (S) into the address register (ADH:ADL). So far so good.

Then, on cycle #2, it puts the value that it read on cycle #1 (low byte of the target address) into S! It uses the stack pointer as a temporary! It cannot put it into PCL yet because it will need the old value later to push it on the stack and also to fetch the high byte of the target address later.

While doing this internally, it reads (and throws away) a byte from the address pointed to by the address register (which now contains the old value of the stack pointer) because the 6502 has to perform either a read or a write on every cycle (and a read is safer in the presence of memory mapped devices).

Then, on cycles #3 and #4, it proceeds to push the old (still not overwritten) value of PCH and PCL, in that order, just like in your chart.

Finally, on cycle #5, it reads the high byte of the target address, puts it in PCH while simultaneously copying S into PCL and copying the updated stack pointer value, which ends up in ALU register due to decrement operations during the push cycles, back into S.

Truly crazy design.

"and a read is safer in the presence of memory mapped devices". Yet not completely safe, esp. when the read would acknowledge the interrupt, for example. — lvd
– lvd, Commented Sep 30 at 12:58
True, no bus access is ever safe from side effects on 6502. But due to the design limitation of "exactly one memory access per cycle", it has to do spurious reads and writes for some instructions. — cyco130
– cyco130, Commented Sep 30 at 13:01
I looks more like the limitation of the following kind: 'no bus address valid signal'. Unlike 6800 which had such signal. — lvd
– lvd, Commented Sep 30 at 13:23
Does the 6502 always put the operand byte into S and reload S from ADL, or is that behavior unique to JSR? Because the 6502 was implemented with a single-metal NMOS process, the amount of space taken up by single-input single-output latches is relatively slight compared with the space taken up by signal routing. Where's the program counter stored? I wonder how much it would have cost to add transparent latches between the bus and the inputs to the program counter halves to allow JSR to latch both bytes of the new program counter address and then push the old one? That could have shaved... — supercat
– supercat, Commented Sep 30 at 18:42
...a cycle off both JSR and RTS, and made their operation more intuitive (with JSR storing the address of its following instruction, rather than the address of its last byte). — supercat
– supercat, Commented Sep 30 at 18:47

Stack Exchange Network

Why does the 6502 read from the stack before writing it?

2 Answers 2

TL;DR:

Background

The Finer Details

You must log in to answer this question.

Linked

Hot Network Questions

Why does the 6502 read from the stack before writing it?

2 Answers 2

TL;DR:

Background

The Finer Details

You must log in to answer this question.

Linked

Related

Hot Network Questions