Install using:
pip install py-sharedmemory Python's standard multiprocessing.Queue relies on _winapi.CreateFile for inter-process communication (IPC), introducing significant I/O overhead. This can become a performance bottleneck in demanding applications like Reinforcement Learning, scientific computing, or distributed systems that transfer large amounts of data (e.g. NumPy arrays or tensors) between processes (actors, replay buffers, trainers, etc.).
py-sharedmemory provides an alternative that utilizes multiprocessing.shared_memory (and therefore _winapi.CreateFileMapping) for the main data and sends only lightweight metadata through the queues. This eliminates most inter-process I/O, reducing system load and latency. If you're hitting performance limits with standard queues, py-sharedmemory may help.
import multiprocessing as mp from memory import create_shared_memory_pair, SharedMemorySender, SharedMemoryReceiver def producer_sm(sender:SharedMemorySender): your_data = "your data" sender.put(your_data) # blocks until space is available sender.put(your_data, timeout=3) # raises queue.Full exception after 3s sender.put(your_data, block=False) # raises queue.Full exception if no space available sender.put_nowait(your_data) # ^ equivalent to above # ... # wait for all data to be received before closing the sender # to properly close all shared memory objects sender.wait_for_all_ack() def consumer_sm(receiver:SharedMemoryReceiver): data = receiver.get() # blocks data = receiver.get(timeout=3) # raises queue.Empty exception after 3s data = receiver.get(block=False) # raises queue.Empty exception if no data available data = receiver.get_nowait() # ^ equivalent to above # ... if __name__ == '__main__': sender, receiver = create_shared_memory_pair(capacity=5) mp.Process(target=producer_sm, args=(sender,)).start() mp.Process(target=consumer_sm, args=(receiver,)).start()There is a certain overhead to allocating shared memory which is especially noticable for smaller objects. Use the following heuristic depending on the size of the data you are handling:
| 10B | 100B | 1KB | 10KB | 100KB | 1MB | 10MB | 100MB | 1GB | 10GB | |
|---|---|---|---|---|---|---|---|---|---|---|
mp.Queue() | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
py-sharedmemory | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
I benchmarked data transfer performance using both the standard multiprocessing.Queue and my py-sharedmemory implementation:
Starting around 1MB per message, py-sharedmemory matches or slightly trails the standard queue in speed. However, the key advantage is that it avoids generating I/O, which becomes critical at larger data sizes. Notably, the standard implementation fails with 10GB messages, while py-sharedmemory handles them reliably.
Here’s the I/O load on my Windows system using the standard queue:
And here’s the I/O load using py-sharedmemory:
In practice, py-sharedmemory delivers smoother and more stable performance, with consistent put/get times and no slowdowns, especially under high data throughput.


