I am working on a Go application where two concurrent maps, products and productCatalog, are accessed by numerous threads in live traffic to retrieve data at high throughput. These maps are populated with data during server startup and are periodically updated with delta changes every 30 minutes from Kafka.
products *cmap.ConcurrentMap[int64, *definitions.ProductData] productCatalog *cmap.ConcurrentMap[int64, *cmap.ConcurrentMap[int64, struct{}]] The concurrent map implementation used is from this library.
Problem: The periodic updates require writing to the same concurrent maps from which I am reading, in real production servers, also at high throughput. This simultaneous reading and writing cause some contention and latency spikes in our system.
Requirement: I need to maintain the original data and only update it with deltas. So, a simple swap of maps won’t work. I am looking for a solution where I can periodically update my map as well but those writes should not cause problems with read traffic.
Considered Solution: I am considering maintaining a secondary map to store the deltas. During reads, I would check both the original map and the delta map. I would then periodically merge the delta map into the original map during low traffic periods.
Questions:
Is the considered solution optimal, or is there a better approach to handle high throughput reads and periodic writes without causing contention?
How can I optimize the merge operation to be as fast as possible and ensure that the most recent data is accessed first during reads?
Would using more granular locks or sharding during the merge operation reduce contention during reads and writes?
Any insights or suggestions would be greatly appreciated! I would greatly appreciate any specific ideas on approaches to try or measurements to make to optimize this use case.
Update
Thanks for your insight; it's been invaluable. I've implemented the approach you recommended and will be running tests soon. One point I'm pondering is whether to opt for a deep copy or if a shallow copy would suffice in my scenario.
To provide a bit more context:
- Upon startup, I populate the products and productCatalog maps.
- These maps then undergo periodic updates every 30-40 minutes.
- The majority of the operations are read-heavy, with getters that retrieve data from these maps. For instance, with a specific product ID, the getter fetches the ProductData struct from the products map. Similarly, for the other map, there's a getter that returns a list of product ids given a catalogId.
It's crucial to note that after retrieving these values, I don't make modifications to them. They're essentially treated as read-only post-retrieval as I just use the values after retrieving from the getters method.
Given this workflow, I'm trying to gauge if a deep copy is truly necessary, or if a shallow copy would cater to my needs without any unforeseen repercussions. Your thoughts?