As far as I know, memory barriers are used to avoid out-of-order execution. However, memory barriers are often mentioned also when talking about cache coherence. I'm not sure how the two concepts are connected, since - according to my findings - cache coherence should already be guaranteed at a hardware level through various protocols, e.g. MESI and such. Is preventing out-of-order execution with memory barriers another way to (manually) grant cache coherence?
- 3The simple answer, without getting into implementation details, is that the out-of-order CPU can access the cache in a order that's not the same as program order. The cache coherency protocol can't put those accesses back into program order, but memory barriers can stop them from being out of program order to begin with.Ross Ridge– Ross Ridge2019-11-22 19:01:19 +00:00Commented Nov 22, 2019 at 19:01
1 Answer
On modern CPUs stores first go into a store buffer. When the store leaves the store buffer and is applied to the cache line only then the cache coherence protocol gets involved.
While the store is pending in the store buffer, the CPU which made the store can read it back from the store buffer (store-to-load forwarding), but other CPUs cannot observe the effects of the store just yet.
Memory barriers such as x86 MFENCE wait for the store buffer to drain:
Performs a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior the MFENCE instruction. This serializing operation guarantees that every load and store instruction that precedes the MFENCE instruction in program order becomes globally visible before any load or store instruction that follows the MFENCE instruction. The MFENCE instruction is ordered with respect to all load and store instructions, other MFENCE instructions, any LFENCE and SFENCE instructions, and any serializing instructions (such as the CPUID instruction). MFENCE does not serialize the instruction stream.
See Memory Barriers: a Hardware View for Software Hackers for more details.
2 Comments
sfence is not relevant. sfence doesn't stop StoreLoad reordering, only StoreStore which is already prohibited by x86 in general except for NT stores. mfence would be a better example: it stops later load instructions from reading cache until the store buffer drains. sfence does not. Notice that It is not ordered with respect to memory loads. It doesn't block execution until the store-buffer drains, it only blocks commit of later stores until the store buffer drains. You can think of it as setting up a "fence" in the store buffer that nothing can cross.mfence is the right example. Corrected.