High Throughput Concurrent Map Access and Periodic Updates Causing Contention and Latency Spikes

Question

I am working on a Go application where two concurrent maps, products and productCatalog, are accessed by numerous threads in live traffic to retrieve data at high throughput. These maps are populated with data during server startup and are periodically updated with delta changes every 30 minutes from Kafka.

products *cmap.ConcurrentMap[int64, *definitions.ProductData] productCatalog *cmap.ConcurrentMap[int64, *cmap.ConcurrentMap[int64, struct{}]]

The concurrent map implementation used is from this library.

Problem: The periodic updates require writing to the same concurrent maps from which I am reading, in real production servers, also at high throughput. This simultaneous reading and writing cause some contention and latency spikes in our system.

Requirement: I need to maintain the original data and only update it with deltas. So, a simple swap of maps won’t work. I am looking for a solution where I can periodically update my map as well but those writes should not cause problems with read traffic.

Considered Solution: I am considering maintaining a secondary map to store the deltas. During reads, I would check both the original map and the delta map. I would then periodically merge the delta map into the original map during low traffic periods.

Questions:

Is the considered solution optimal, or is there a better approach to handle high throughput reads and periodic writes without causing contention?
How can I optimize the merge operation to be as fast as possible and ensure that the most recent data is accessed first during reads?
Would using more granular locks or sharding during the merge operation reduce contention during reads and writes?

Any insights or suggestions would be greatly appreciated! I would greatly appreciate any specific ideas on approaches to try or measurements to make to optimize this use case.

Update

Thanks for your insight; it's been invaluable. I've implemented the approach you recommended and will be running tests soon. One point I'm pondering is whether to opt for a deep copy or if a shallow copy would suffice in my scenario.

To provide a bit more context:

Upon startup, I populate the products and productCatalog maps.
These maps then undergo periodic updates every 30-40 minutes.
The majority of the operations are read-heavy, with getters that retrieve data from these maps. For instance, with a specific product ID, the getter fetches the ProductData struct from the products map. Similarly, for the other map, there's a getter that returns a list of product ids given a catalogId.

It's crucial to note that after retrieving these values, I don't make modifications to them. They're essentially treated as read-only post-retrieval as I just use the values after retrieving from the getters method.

Given this workflow, I'm trying to gauge if a deep copy is truly necessary, or if a shallow copy would cater to my needs without any unforeseen repercussions. Your thoughts?

Bart van Ingen Schenau · Accepted Answer · 2023-10-02 11:02:10Z

Is the considered solution optimal, or is there a better approach to handle high throughput reads and periodic writes without causing contention?

Assuming you have a fairly constant read-load on the data structure, but changes only happen in batches with a 30 minute interval, then I don't consider your solution to be optimal.

If you have enough memory to have two copies of the data structure, I would use a non-concurrent map (just a regular map/dictionary that doesn't take concurrent access into account at all other than that the implementation can handle concurrent read actions without the need for locking primitives) and handle the updates in this way:

When a batch of updates arrives, create a deep copy of the map.
Apply the updates to the copy
Update the request handler to look at the updated copy for the next request. This is where locking might be needed if that update cannot be made in an atomic way, or if multiple maps are involved, you cannot arrange the order of updates such that seeing a partial change isn't problematic.
Discard the old maps as soon as the running requests are no longer referencing them.

This is basically the swap of maps that you discounted, but then with delta-updates.

If locks cannot be avoided in step 3, then you should arrange your code such that the read-lock is only held for the time it takes to copy the current map address into a local variable and to perform the rest of the actions outside of the lock. That ensures the lock is held for the shortest time possible.

There is no need to provide locks on the individual elements, because the ones being written are not read (at least not by other threads) and the ones being read by multiple threads are not being written. Therefor, there are no race conditions within the data access and no locks are needed.

Great answer. I'm not sure if can call this 'double-buffering' but at least conceptually it's the same basic idea. — JimmyJames
– JimmyJames, Commented Oct 2, 2023 at 15:21
Do you want me to use non concurrent map just for updates part or for for everything in my scenario? — dragons
– dragons, Commented Oct 2, 2023 at 15:54
@dragons If you follow this answer, you should not need a concurrent map at all. The idea is you make all the updates required before exposing it to other threads. Concurrent reads should not be an issue unless there's a design flaw with the non-concurrent map. — JimmyJames
– JimmyJames, Commented Oct 2, 2023 at 17:53
Thank you for clarifying. I've incorporated your solution into my project and am gearing up to test it. However, I'm grappling with a decision: whether to opt for a deep copy or if a shallow copy suffices for my scenario. I have updated my question to add more details. Can you help me understand whether shallow copy will be sufficient for my use case? — dragons
– dragons, Commented Oct 2, 2023 at 23:05
@dragons, with a shallow copy, you are still sharing data between reading and writing threads, with the risk of seeing partially updated data, which can be as bad as half-written integers or other primitive types. — Bart van Ingen Schenau
– Bart van Ingen Schenau, Commented Oct 3, 2023 at 6:54

Stack Exchange Network

High Throughput Concurrent Map Access and Periodic Updates Causing Contention and Latency Spikes

1 Answer 1

Hot Network Questions

High Throughput Concurrent Map Access and Periodic Updates Causing Contention and Latency Spikes

1 Answer 1

Related

Hot Network Questions