The answer to this question depends a lot on what part of the computer you're looking at and what exactly you mean by “simultaneously”.
From the perspective of the programmer, yes, reads and writes to memory regions larger than one byte do generally occur simultaneously. Mainstream CPU architectures like AMD64 and ARMv8 even guarantee that such accesses will occur atomically, as long as:
- the size of the value is between 1 byte and 1 machine word (e.g. 64 bit)
- the address is aligned, i.e. a multiple of the size
Here, “atomically” means that the access will be performed in its entirety without interruption. It is impossible to observe a half-written state. In theory, a CPU could implement this by doing reads or writes sequentially one byte at a time, and preventing concurrent access with a lock. But in practice, the atomicity is implemented on the hardware level.
Note that “atomically” does not mean that a written value will become immediately available to processes running on other CPUs: they might see old values, or may have their own conflicting writes. Thus, different CPUs can do their own memory addresses simultaneously. However, any process sees one or the other in its entirety. What level of cache coherency is guaranteed depends on the specific CPU architecture. Whereas AMD64 has a strong memory model that prevents such conflicts, ARMv8 uses a weaker but more efficient memory model that requires special instructions.
CPUs often have instructions that can read or write larger memory regions, e.g. SIMD instructions. But these aren't generally guaranteed to be atomic in their entirety, but only for each sub-access for which the previously mentioned conditions hold.
Since memory accesses can take very long to complete (hundreds of CPU cycles) CPUs don't usually wait for completion, but carry on with the next instruction. This means that multiple concurrent memory accesses from the same CPU can be “in flight” at the same time. This does not affect the atomicity of each individual access.
Under the hood, memory is very complex. There are multiple caches in the CPU, and finally the main memory.
Caches might be shared between CPUs, or might belong to a particular CPU. The caches are not managed in terms of bytes, but in terms of cache lines, typically 64 bytes. So when I read or write a memory location, the surrounding 64 bytes (depending on alignment) will be loaded from the lower cache level. While this happens atomically, this might consist of multiple sequential signals in the underlying cache protocol.
When there's an interaction with DDR RAM, a protocol is used that can only transfer one byte at a time on the electrical level. Nevertheless, the DDR RAM protocol only uses bursts of 8 bytes that appear simultaneously from the outside.
What does an application developer need to know about this? Nothing, really. The programmer should use the language's atomic types if atomic accesses are required, specifying the proper (but machine-independent) memory ordering as needed. (Using a volatile specifier is usually wrong though). But details about atomicity and caching can be super relevant in systems programming when you're the one implementing those atomic types, or when you're building lock-free data structures. For example, the smallest size for atomic modifications is one byte, meaning that you can't flip one bit atomically (have to do compare-and-swap on the whole byte). And since you can at most modify one word/pointer atomically, you can't change two pointers at the “same” time.