> how to avoid making multiple requests to db and multiple writes to cache of the same data when all nodes concurrently encounter a cache miss

The "all nodes" aspect sounds like trouble.

The usual approach is have some load-balancing layer
hash on a relevant identifier and direct all requests
for identifier X to the _same_ server Y, using modulo.
So if N servers are up at the moment
we peel off some prefix bits from SHA3(X)
to compute Y = hash % N.
As servers come and go the details will change
and a request for X will end up going to another server,
but hey, reboots are rare and after all that's
what cache TTLs are for, right?

Suppose each server allocates 1 GiB of RAM for caching results.
Then N servers offer an aggregate of N gigabytes,
and critically we expect to find a given record X
resident in exactly **one** server, according to hash mod N.
This is a big win, and it works remarkably well.

There's little value add from distributed locks in such a scheme,
assuming "last write wins" or similar conflict resolution approaches
are applicable.

> would potentially request the data 3 times (one for each node)

That simply doesn't happen if the load balancer
is hashing on the data X, obtaining a result modulo N which is 3,
and directing _all_ requests about X to a single node.
That node is certainly free to acquire / release
a local lock for X if we're concerned that multiple X requests
may arrive while a backend database query about X is pending.

----

We assume the DB is "large" and RAM-resident cache is "small".

Per unit time the update process is injecting a "small"
number of updates, and is in a good position to
issue cache invalidates.
This might take the form of reducing default 3600-second TTLs
to just a few seconds.
In some caching systems the entry will soon expire.
In other systems an entry, or at least a **popular** entry,
which is about to expire may trigger scheduling a pre-emptive DB query
to refresh it and extend its TTL.
This offers a very natural way to discover that the version incremented.
Such a background query allows the timely update of even a "hot" record
with zero latency impact seen by front-end clients,
at the cost of only serving stale data
for a controlled and limited interval.

Let's assume the DB update process
- cannot access the cache, and
- always writes an `updated` timestamp when INSERTing a row.

That still gives us the flexibility to create a new
cache update process responsible for timely cache invalidations.

while (True) do:
- Perform an indexed query on "recent" updated timestamps, to retrieve rows updated since the previous loop iteration.
- Issue cache invalidations for each row.
- Optionally update the cache with the fresh row data, perhaps preserving the old TTL so unpopular items consume less cache space
- Briefly sleep().
- Lather, rinse, repeat the loop.