How does the program counter in register 15 expose the pipeline?

Question

I have a question assigned to me that states:

"The ARM puts the program counter in register r15, making it visible to the programmer. Someone writing about the ARM stated that this exposed the ARM's pipeline. What did he mean, and why?"

We haven't talked about pipelines in class yet so I don't know what that is and I'm having a hard time understanding the material online. Can someone help me out with either answering the question or helping me understand so I can form my own answer?

Thanks!

Sean Houlihane · Accepted Answer · 2019-02-08 18:00:37Z

An exposed pipeline means one where the programmer needs to consider the pipeline, and I would disagree that the r15 value offset is anything other than an encoding constant.

By making the PC visible to the programmer, yes, some fragment of the early implementation detail has been 'exposed' as an architectural oddity which needs to be maintained by future implementations.

This wouldn't have been worthy of comment if the offset which has been designed into the architecture was zero - there would have been no optimisation possible for simple 3 stage pipelines, and everyone would have been non the wiser.

There is nothing 'exported' from the pipeline, not in the way that trace or debug allow you to snoop on timing behaviour at code is running - this feature is just part of the illusion that the processor hardware presents to a programmer (similar to each instruction executing in program order).

A problem with novel tricks like this is that people like to write questions about them, and those questions can easily be poorly phrased. They also neglect the fact that even if the pipeline is 3 stage, it only takes a single 'special case' to require the gates for an offset calculation (even if these gates don't consume power for typical operation).

Having PC relative instructions is rather common. Having implementation optimised encodings for how the offest is calculated is also common - for example IBM 650

Retrocomputing.SE is an interesting place to learn about some of the things that relate to the evolution of modern computers.

halfer · Accepted Answer · 2022-05-08 11:52:38Z

It doesn't really; what they are probably talking about is the program counter is two instructions ahead of the instruction being executed. But that doesn't mean its a two or three deep pipe if it ever was. It exposes nothing at this point, in the same way that the branch shadow in MIPS exposes nothing. There is the text book MIPS and there is reality.

There is nothing magical about a pipeline; it is a computer version of an assembly line. You can build a car in place and bring the engine the doors, the wheels etc to the car. Or you could move the car through an assembly line and have a station for doors, a station for the wheels and so on. You have many cars being built at once, and the number of cars coming out of the building is every few minutes but that doesn't mean a new car takes a few minutes to build. It just means the slowest step takes a few minutes, front to back it roughly takes the same amount of time.

An instruction has several should be obvious steps, an add requires that you get the instruction, decode it, gather up the operands feed them to an adder (alu) and take the result and store the result. Other instructions have similar steps and a similar number.

Your text book will use terms like fetch, decode, execute. So if you are fetching an instruction at 0x1000, then one at 0x1004 and then one at 0x1008 hoping that the code is running linearly with no branches then the when the 0x1004 is being fetched 0x1000 is being decoded, when 0x1008 is being fetched then 0x1004 is being decoded and 0x1000 might be up to execution, depends. so then one might think well when 0x1000 is being executed the program counter is fetching 0x1008 so that tells me how the pipeline works. Well it doesn't I could have a 10000 deep pipeline and have the program counter that the instruction sees be any address I like relative to the address of that instruction I could have it be 0x1000 for the instruction at 0x1000 and have a 12345 deep pipeline. Its just a definition, it might at some point in history been put in place because of a real design and real pipe or it could have always just been defined that way.

What does matter is that the definition is stated and supported by the instruction set if they say that the pc is instruction plus some offset then it needs to always be that or needs to document the exceptions, and match those definitions. Done. Programmers can then programmers compilers can be written, etc.

A textbook problem with pipelines (not saying it isn't real) is that say we run out of v8 engines, we have 12 trucks in the assembly line, we build trucks on that line for some period of time then we build cars with v6's. The engines are coming by slow boat once they finish building them but we have car parts ready now, so let's move the trucks off the line and start the line over, for N number of steps in the assembly line no cars come out the other end of the building, once the first car makes it to the end then a new car every few minutes. We had to flush the assembly line. When you are running instruction 0x1000, 0x1004, 0x1008, etc. If you can keep the pipeline moving it's more efficient but what of 0x1004 is a branch to 0x1100, we might have 0x1008's instruction and 0x100c's in the pipe, so we have to flush the pipe start fetching from 0x1100 and it takes some number of clock cycles before we are completing instructions again then ideally we complete one per clock after that until another branch. So if you read the classic textbook on the subject where mips or an educational predecessor is used, they have this notion of a branch shadow or some other similar term, the instruction after the branch is always executed.

So instead of flushing N instructions out of the pipe you flush N-1 instructions and you get an extra clock to get the next instruction after the branch into the pipeline. And that's how mips works by default but when you buy a real core you can turn that off and have it not execute the instruction after the branch. It's a great textbook illustration and was probably real and probably is real for "let's build a mips" classes in computer engineering. But the pipelines in use today don't wait that long to see that the pipe is going to be empty and can start fetching early and sometimes gain more than one clock rather than flush the whole pipe. If it ever did, it does not currently give MIPS any kind of advantage over other designs nor does it give us exposure to their pipeline.

With the original ARM26 implementations, this was an artefact of the pipeline - and literally where the CPU was fetching the next instruction from. It saved gates - the original cores were famously small, having no cache. low tens of thousands of gates. The pipeline was 3-stage. As PC-relative addressing used r15 as an operand, any CPUs with longer pipelines (Intel's StrongARM was an example - IIRC a 5-stage pipeline) had to maintain this relationship, but actually fetch instructions at PC+offset.
Sure the armv4 carries a lot of compatibility to those arm1/2/3 (armv1/v2) chips when the A stood for Acorn. And one would assume that the two ahead "exposed the pipeline" for the armv1/v2. But even with the armv4/arm7 (A stood for Advanced)(and they are now ip cores not chips) one assumes it is just a legacy compatibility thing not exposing any depth nor design details.

Collectives™ on Stack Overflow

How does the program counter in register 15 expose the pipeline?

2 Answers 2

Comments

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Related