Can more than one memory address be accessed at one time?

Question

Apologies if this isn't the correct place to ask the question.

But, I'm stumped. Can multiple memory addresses be read/written at one time, or must one address be read then another, is it possible for this to happen simultaneously?

I.e. If the CPU requires 4 bytes of data from RAM, then can those 4 memory address be read and sent at once, or it are these requests made one at a time?

I've heard that it depends on the data bus width, with on x86_64 the bus is 64 bits wide, which you would think would allow for 8 bytes of RAM addresses to be accessed at once.

Would that also mean that 8 bytes of data is read/written to/from RAM even if the data required to be read/written is less than 8 bytes, would the other bytes be ignored?

Can't find any definitive answer anywhere, any information would be greatly appreciated.

The underlying question here is "why do you need to know?" To a very real extent, this is an implementation detail of the exact processor you're running on, and if you're getting down to that kind of level you need a lot more detail about the actual problem you're trying to solve and which processors/architectures you care about. — Philip Kendall
– Philip Kendall, Commented May 9, 2021 at 8:51

amon · Accepted Answer · 2021-05-09 09:21:54Z

The answer to this question depends a lot on what part of the computer you're looking at and what exactly you mean by “simultaneously”.

From the perspective of the programmer, yes, reads and writes to memory regions larger than one byte do generally occur simultaneously. Mainstream CPU architectures like AMD64 and ARMv8 even guarantee that such accesses will occur atomically, as long as:

the size of the value is between 1 byte and 1 machine word (e.g. 64 bit)
the address is aligned, i.e. a multiple of the size

Here, “atomically” means that the access will be performed in its entirety without interruption. It is impossible to observe a half-written state. In theory, a CPU could implement this by doing reads or writes sequentially one byte at a time, and preventing concurrent access with a lock. But in practice, the atomicity is implemented on the hardware level.

Note that “atomically” does not mean that a written value will become immediately available to processes running on other CPUs: they might see old values, or may have their own conflicting writes. Thus, different CPUs can do their own memory addresses simultaneously. However, any process sees one or the other in its entirety. What level of cache coherency is guaranteed depends on the specific CPU architecture. Whereas AMD64 has a strong memory model that prevents such conflicts, ARMv8 uses a weaker but more efficient memory model that requires special instructions.

CPUs often have instructions that can read or write larger memory regions, e.g. SIMD instructions. But these aren't generally guaranteed to be atomic in their entirety, but only for each sub-access for which the previously mentioned conditions hold.

Since memory accesses can take very long to complete (hundreds of CPU cycles) CPUs don't usually wait for completion, but carry on with the next instruction. This means that multiple concurrent memory accesses from the same CPU can be “in flight” at the same time. This does not affect the atomicity of each individual access.

Under the hood, memory is very complex. There are multiple caches in the CPU, and finally the main memory.

Caches might be shared between CPUs, or might belong to a particular CPU. The caches are not managed in terms of bytes, but in terms of cache lines, typically 64 bytes. So when I read or write a memory location, the surrounding 64 bytes (depending on alignment) will be loaded from the lower cache level. While this happens atomically, this might consist of multiple sequential signals in the underlying cache protocol.

When there's an interaction with DDR RAM, a protocol is used that can only transfer one byte at a time on the electrical level. Nevertheless, the DDR RAM protocol only uses bursts of 8 bytes that appear simultaneously from the outside.

What does an application developer need to know about this? Nothing, really. The programmer should use the language's atomic types if atomic accesses are required, specifying the proper (but machine-independent) memory ordering as needed. (Using a volatile specifier is usually wrong though). But details about atomicity and caching can be super relevant in systems programming when you're the one implementing those atomic types, or when you're building lock-free data structures. For example, the smallest size for atomic modifications is one byte, meaning that you can't flip one bit atomically (have to do compare-and-swap on the whole byte). And since you can at most modify one word/pointer atomically, you can't change two pointers at the “same” time.

Also, nowadays, memory controllers are typically part of the CPU, and some designs may have more than one memory controller. E.g. the Azul Vega CPU consisted of six clusters of 9 cores with 4 memory controllers shared between those 54 cores. — Jörg W Mittag
– Jörg W Mittag, Commented May 9, 2021 at 9:14
@JörgWMittag However, Azul Vega is some weird custom RISC architecture that's far far away from the AMD64/ARM mainstream. But building a CPU for Java sounds challenging because it has a famously strong (convenient but inefficient) memory model. As Stroustrup writes about C++: “We knew that Java had a good memory model … and hoped to adopt that. [But] Intel and IBM effectively vetoed the idea by pointing out that [this] would slow down all JVMs by a factor of at least two. Consequently, to preserve the performance of Java, we had to adopt a far more complex model for C++.” — amon
– amon, Commented May 9, 2021 at 9:34
Thank you for the reply, the detail in your answer is extremely helpful. So, within RAM data is accessed one address at a time, at the electrical level. So what would happen to the rest of the 7 bytes of the data bus if RAM is byte addressable? I'm unsure how the process works, as addresses are sent to RAM one at a time, and RAM is byte addressable so why would be need a 64 bit data bus, why not 1 byte if the RAM addresses are only a byte? Apologies if this seems like an idiotic question. — jdow
– jdow, Commented May 9, 2021 at 20:34
@jdow It is very rare to only need one byte, and loading some surrounding bytes is very cheap. The big limitation with memory is latency, not throughput, so there are tons of optimizations and heuristics that try to load memory into caches before you need it. Caches need to know what the state of each address is. Slightly simplifying things, thinking in 64-bit words or 512-bit cache lines instead of 8-bit words means we only have to track the state of 1/8th or 1/64th of the addresses we'd otherwise need. — amon
– amon, Commented May 9, 2021 at 21:10

gnasher729 · Accepted Answer · 2021-05-09 10:22:40Z

Question first: What difference does it make?

It is quite possible that a processor has two separate paths to access separate RAMs. That’s more likely to happen for high end processors. And since the processor likely doesn’t care which core or virtual core accessed the data, it will likely be true for cores and virtual cores as well.

But you have a memory hierarchy: L1 cache, L2 Cache, possibly L3 cache. It is very common that high performance cores can access more than one item from L1 cache simultaneously.

Useless · Accepted Answer · 2021-05-11 13:02:38Z

Can't find any definitive answer anywhere

Well you haven't provided enough information to give a definitive answer. You're asking about implementation details without specifying a platform. I'll stick to addressing, in relatively vague terms, something like contemporary x86 systems.

Can multiple memory addresses be read/written at one time

Yes, in some ways:

if you have multiple cores each with their own L1 cache, they can each write to independent cachelines simultaneously
whenever a pending write is pushed back to a shared L3 cache, or to main memory, a full cache line is written as one operation.

This contains multiple bytes, and you could call it "multiple addresses" if you're thinking in terms of byte addressing ... however this is the wrong way to think about memory at this level of abstraction. Even if your program logic deals with bytes, the memory hardware deals with cachelines as discrete, non-overlapping atomic objects.

... but no in other ways:

if you have multiple cores each with their own L1 cache, they will still be prevented from writing simultaneously to the same cacheline by the cache coherency protocol (again, on x86 specifically)
only one cacheline can be read from or written back to main memory at a time. Unless you have a NUMA platform which I'm absolutely not going to address (sorry) here

If the CPU requires 4 bytes of data from RAM, then can those 4 memory address be read and sent at once, or it are these requests made one at a time?

As I've already said, the native unit of memory operations is the cacheline. If all four bytes are in the same cache line, you can get them in one operation. If they're each in different cache lines, you need four operations. The question is completely unanswerable without knowing how the memory is layed out. The fine detail of memory layout can be very important for high-performance code.

Would that also mean that 8 bytes of data is read/written to/from RAM even if the data required to be read/written is less than 8 bytes, would the other bytes be ignored?

They're not ignored, they populate the line in cache ... which may then be mutated in some bytes but not others, and will eventually be written back again as an entire line.

Can't find any definitive answer anywhere

Again, that's because you haven't specified which platform's memory architecture you're interested in.

The fine details of how DRAM works are overkill, but that page does have a link to the one for CPU cache, which describes how memory is actually used by a modern processor (which itself has a link to a series of articles by Ulrich Drepper which are probably a better starting-point, once you've actually found them).

Stack Exchange Network

Can more than one memory address be accessed at one time?

3 Answers 3

Hot Network Questions

Can more than one memory address be accessed at one time?

3 Answers 3

Related

Hot Network Questions