2
$\begingroup$

From prof. Onut Mutlu's slides on prefetching, this example has been shown as software prefetching:

for (i = 0; i < N, i++) { __prefetch(a[i + 8]); __prefetch(b[i + 8]); sum += a[i] * b[i]; } 

On Wikipedia, it says that to determine the number of element of an array to prefetch ahead is calculated by $miss\_penalty/cycles\_per\_iteration$. My question is if the processor is in-order, what is the point of software prefetching, because even if I prefetch with the formula above, I have to stall my CPU, and sum += a[i] * b[i]; can not go through memory access stage. And overall number of memory accesses and sum += a[i] * b[i]; will be the same with or without prefetching.

I understand that if my CPU is out order, then in 8th iteration sum += a[8] * b[8]; can just go through memory access stage, as a[8] and b[8] will be cache hit, so it will just take 8 cycles and because k = 8, then next 7 sum instructions will also just go through without a problem.

$\endgroup$

1 Answer 1

6
$\begingroup$

"In-order" processors only issue instructions in order. Completion is out-of-order even on most processors that are called "in-order". "in-order" just means: if the processor needs to stall the issuing of the next instruction because of a RAW, WAW, or WAR dependence, it can't issue any other instruction during the stall.

In the example you are showing the sum += a[i]*b[i] doesn't have any dependence on the immediately preceding __prefetch() instructions, so can issue immediately.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.