10

Simple processors (microprocessors or otherwise, especially older ones) often have an accumulator register that serves as the implicit source/destination register for instructions. In a microprocessor like the 6502, the accumulator has a very crucial role as the only register that can be used with arithmetic/logic instructions like adc. Conversely for the Z80 with its much larger register file, the accumulator's raison d'être is to keep the instruction size small, since specifying two registers à la ld r, r takes up a quarter of the single-byte opcode space.

However, one thing these processors are consistent on is that the accumulator is always the destination rather than the source. Did any architecture reverse this such that the accumulator is the source? That is, instead of a = a + thing, the architecture performed operations of the form thing = thing + a.

For architectures like the Z80, reversing the order of the operands would decrease the size of common load/operation/store patterns. For instance, this style of code:

ld a, d ; 11 bytes add a, (hl) ld d, a ld a, (ix+5) xor 7 ld (ix+5), a 

could be replaced with the more compact and efficient code:

ld a, (hl) ; 7 bytes add d, a ld a, 7 xor (ix + 5), a 

Conversely, this would have a disadvantage for repeated operations on the same value, especially on strict accumulator/memory architectures like the 6502, since code like this:

lda foo ; 10 bytes adc #7 eor bar,y lsr a sta foo 

would become slower code that performs repeated read/writes on the same memory location:

lda #7 ; 11 bytes adc foo lda bar,y eor foo lsr foo 

For the Z80, this disadvantage would less pronounced since you could load foo into another register, although the code would still end up using more instructions.

Granted, either way has tradeoffs, but it's surprising to me that I've never encountered a processor that chose the reverse order, especially since it could potentially have benefits in a lot of common code, depending on what types of code the architecture was designed for. Did any processor ever do this or anything similar to it?

8
  • 5
    Register write back is very complex and takes a lot of time. Modern processors has a lot of forwarding circuit to make the result available to the next instruction even though it hasn't reached the register file. This goes against the very reason to use an accumulator model: simplicity. Commented Oct 28 at 4:09
  • I don't buy that argument. Both the Z80 and the 6502 had instructions that could use non-accumulator registers and memory as combined source/destinations, such as inc d or inc (hl) for the Z80. Extremely basic accumulator machines may not have this sort of functionality, sure, but most that I've seen do. Commented Oct 28 at 4:51
  • "Have there ever been programming languages that used a COMEFROM rather than GOTO construct?" Commented Oct 28 at 7:29
  • 1
    @PatrickSchlüter — precisely :D Now ask yourself: What kind of language is Intercal? Commented Oct 28 at 12:04
  • 3
    The majority of PDP-6 / PDP-10 operations go both ways, implemented as distinct instructions. ADD: add memory to accumulator; ADDM: add accumulator to memory. Of course, you can debate whether those machines have 16 'accumulators' (as per the doc) or 16 'registers'. Commented Oct 28 at 12:54

5 Answers 5

13

Side note: It reminds me of the COME FROM instruction, however, this question is apparently meant seriously. No offence.

The PIC16 series of micro controllers, which I happen to know to a certain degree, has a d flag on its ALU operations involving the accumulator, W, and a byte in memory (a so-called file register). If set to 1, the result is not stored in W, but in the file register, leaving W unchanged. This Wikipedia page has more details.

However, the instructions with literal values only target W, because there are not enough bits in the instruction word to hold address and value.

I have not looked at the other PIC series controllers.

5
  • One should probably add that basically all registers are memory-mapped, and memory is small, so "store to memory" doubles as "store to some register". Commented Oct 28 at 15:22
  • 1
    Interesting, so it's a single-accumulator system that allows both directions. I imagine that combines the best of both worlds (at the expense of slightly wider instructions, but PICs already had "strange" instruction sizes). Commented Oct 28 at 16:17
  • 1
    @v-rob: One thing I wish had been done on the PIC18F series would have been to include an instruction that would override the W source operand for the following instruction, so that an operation like "add contents of memory location X to memory location Y" would take two two words and two cycles, but wouldn't have to disturb W. For many tasks, a single-accumulator design works just fine, but some tasks can't be performed without a lot of operations to save/restore the accumulator. Allowing operations to bypass the accumulator at the expense of another instruction word and execution cycle... Commented Oct 28 at 17:18
  • ...would offer almost as much benefit as having an opcode bit select between using two accumulators, but would be much cheaper to implement. Commented Oct 28 at 17:19
  • @supercat The 6809 CPU has two accumulators. The opcodes have a bit selecting the used one, but the mnemonics include its name, like adda vs addb. Commented Oct 31 at 12:30
9

I would say Sharp ESR-H 61860 pocket computer CPU uses A more as a source than a destination. To give some examples, all ALU instructions using A

ADB 1 14 5 cz (P)+A→(P), (P+1)+B+c→(P+1), P+1→P ADCM 1 C4 3 cz (P)+A+c→(P) ADIA n 2 74 XX 4 cz A+n→A ADM 1 44 3 cz (P)+A→(P) ADN 1 0C 7+3*I cz (P)+A→(P)..(P-I)+c→(P-I), P-I-1→P, BCD ANIA n 2 64 XX 4 .z A&n→A ANMA 1 46 3 .z (P)&A→(P) DECA 1 43 4 cz A-1→A, 2→Q INCA 1 42 4 cz A+1→A, 2→Q ORIA n 2 65 XX 4 .z A|n→A ORMA 1 47 3 .z (P)|A→(P) SBB 1 15 5 cz (P)-A→(P), (P+1)-B-c→(P+1), P+1→P SBCM 1 C5 3 cz (P)-A-c→(P) SBIA n 2 75 XX 4 cz A-n→A SBM 1 45 3 cz (P)-A→(P) SBN 1 0D 7+3*I cz (P)-A→(P), (P-I)-c→(P-I), P-I-1→P, BCD SWP 1 58 2 .. [A>>4|A<<4]→A TSIA n 2 66 4 .z A&n TSMA² 1 C6 3 .z A&(P) 

The 61860 only has 5 real registers P, Q, R (stack pointer) 7 bits pointer to the 96 bytes internal RAM, and DP, PC 16 bits pointer to memory and program counter. The CPU has then several work register that are in the addresses from 0 to 11 and used by different instructions.The accumulator A is at address 2 and the second accumulator B is at 3.

The fill byte instructions use A explicitly as source

FILD 1 1F 4+3*I .. A→(DP)..(DP+I), DP+I→DP FILM 1 1E 5+I .. A→(P)..(P+I), P+I+1→P, A→H 
8

No, Not Really.

It is the basic feature of an accumulator design that its accumulator will always be the destination of all ops. An ISA that allows all data instructions to operate with memory as target is already a two operand ISA (usually memory-to-memory), thus not needing an accumulator at all.

6502, Z80 and almost all early micros are clean accumulator machines.

There are of course exceptions with reverse accumulator designs, but they are all notably slower as a final memory write cycle is needed, or special controller designs where memory is integrated into the CPU core like a register file.

[See below for one interesting variation]


Notes

It's important to keep in mind that this is all and only about data instructions, not other types (program flow, housekeeping, etc).

Basic Accumulator-Machine Operation

Accumulator machines, also known as one-operand architectures, are implemented around the idea that an instruction will always only carry one (additional) address and that address being (mostly) the source address. Having the accumulator being always on the receiving side is what makes this the simple implementation it is. The basic cycle is always structured as

  • fetch data (from memory *1)
  • operate on data (and accumulator)
  • hold result in accumulator

It is always only one memory cycle, always a read and always at the start of an operation. That simple principle can cover all operations, including loading the accumulator, as that is just a NOP operation after fetch. All left is the addition of a store accumulator handling to complete the data processing part of a CPU. Add PC handling (jumps & branches) and maybe housekeeping (interrupts, flags, etc.) and we have a quite capable CPU design with minimal effort, ready to add as many useful data operations as wanted (*2).

Other (Data) Register Instructions

Having additional instructions to load, store and manipulate secondary registers - which are not a general accumulator - does not break that rule.

In fact, they also operate with the basic cycles as before - which in theory would allow all ALU instruction also to work on data transferred to those.

Except that comes with a hefty cost. In case of the 6502 there are

  • two additional registers (X, Y)
  • five ALU operations (OR, AND, EOR, ADC, SBC (*3))
  • eight addressing modes

which would total a need for 80 additional opcodes. Numbers for the 8080 would be similar.

Unrelated Cases

Neither does having multiple accumulators change the principle of above very simple processing cycle. Although, in a strict sense, this would make it a limited two-operand-machine.

Note that the 8080 does not fall here, as none of the other registers can act as an accumulator. They are only an additional source address space.

Additional Real World Requirements for Data

While the basic cycle does handle all needs, real world applications do require one more data handling instruction type: RMW instructions. That is a item of data can be read from memory, manipulated and written back in a single uninterruptible instruction. A necessity for parallel processing and interrupt handling. Both mentioned CPUs have them in form of increment/decrement on memory - which usually is the most simple implementation. The 6502 implements in addition several shift/rotate instructions also as RMW(*4).

Couldn't RMW be Used?

Since RMW is already present one could as well use them for all ALU ops. Much like the 6502 is already doing for its memory shift/rotate. Except, that comes as a double cost increase:

  • A basic ALU op with memory reference costs at least (by using ZP) 3 clocks, while the same as RMW would be 5. Using an accumulator OP followed by store would be 6. So the general savings are rather small if not neglectable (*5).

  • It also needs, like above, a 40 additional op-codes.

There is Only One Real Use Case

Let's get it straight, storing the result of an operation with the accumulator not in it has only one real use case:

  • Holding a constant value to be applied on multiple data items.

That only leaves a very limited real world usages, like masking a string, which seem to be rather rare. Sure, they would be a good speedup in a slim loop, but may not have much impact on over all performance.

Other operations are already stopped by due the 8 bit nature of those CPUs. Any meaning full data item, like for matrix operation, would be most likely made of multi byte items (16 or more bits). Those don't fit an 8 bit accumulator, which means it has to be loaded and reloaded anyway.

Not the Same But Close

Then there is one single real world application which comes somewhat close to the idea and that's Renesas (Mitsubishi) 740 series. Even more interesting as it's a 6500 descendant.

The family features an 'X Modified Operation Mode' or more colloquial 'T-Mode' where many ALU operations don't no longer operate on the accumulator but a (ZP) memory location addressed by X:

enter image description here

It essentially enables any ZP location to act as accumulator. Beside adding many more accumulators, it enables efficient multi byte handling in ZP by using X as pointer. On the down side, all instructions incur a 1..3 cycle penalty:

  • 1 for read only, like CMP
  • 2 for read/store, like LDA (*6)
  • 3 for read/modify/store, like all ALU ops

On the down side the accumulator is no longer fully usable when T is set (*7). It's activated by setting bit 5 of the status register, which adding even more overhead code.

enter image description here

Over all a very nice idea and for sure very handy for embedded (*8), it has a limited usability.


*1 - The 6500 is quite basic on that, while the 8080 adds usage of a register file as second operand, which does reduce code size. Then again, the 6500's zero page addressing is of the same purpose. Neither breaks the structure of them being one-operand machines.

*2 - A kind of hardware that has quite similarities to a BASIC interpreter. It needs only a comparably small core for basic instruction handling (often less than a KiB) which can be augmented with many, many instructions and functions.

*3 - The other 3 ALU ops as well as load/store/compare are already available for X and Y.

*4 - Rockwell's 6500 CMOS design also incorporates similar bit manipulation instructions, further improving usability for embedded applications.

*5 - Of course this ratio gets better with more complex addressing modes.

*6 - Note that neither STA nor PHA/PLA are supported by T-Mode. So while string to ZP can be nicely done by LDA/INX, the other way around is not possible.

*7 - In fact it gets a bit weird as some many accumulator instructions, like shift/rotate or BIT, still work on the build in one.

*8 - 740 based controllers have their I/O Register file within ZP.

8

The SDS/XDS Sigma 7 had an AWM (Add Word to Memory) instruction. It wasn't a single accumulator machine like the 6502 and Z80. Any of the 16 general registers could be used as an accumulator, and that was how most of the arithmetic instructions worked.

With core memory, read/modify/rewrite was a natural thing to do. Read was destructive, so rewrite was necessary, might as well write something else if useful. There were a bunch of Sigma 7 instructions that took advantage of this even though most ordinary arithmetic instructions were memory to register.

I presume the other members of the Sigma series also had AWM, but I haven't checked. The Sigma 7 is the one I programmed back in the day.

1
  • 1
    Similarly the PDP-6, as I commented above. Commented Oct 28 at 12:56
6

The PIC architecture allows most instructions to specify that they operate as either MEM op W -> MEM or as MEM op W -> REG. Interestingly, in the assembler, the form with MEM as the destination is the default, and the subtract instruction computes MEM - W, rather than W - MEM, regardless of which register is the destination.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.