2

I'm looking for a C# pattern for coding a synchonized operation including writes to two different databases for a particular entity such that I can avoid race conditions for simultaneous operations on the same entity.

E.g. Thread 1 and thread 2 are processing an operation on entity X at the same time. The operation writes information for X to database A (in my case, an upsert to MongoDB) and database B (an insert to SqlServer). Thread 3 is processing the same operation on entity Y. The desired behavior is:

  • Thread 1 blocks thread 2 while processing writes to A and B for entity X.
  • Thread 2 waits until thread 1 completes writes to A and B and then makes writes to A and B for entity X.
  • Thread 3 is not blocked and processes writes to A and B for entity Y while thread 1 is processing

The behavior I'm trying to avoid is:

  • Thread 1 writes to A for entity X.
  • Thread 2 writes to A for entity X.
  • Thread 2 writes to B for entity X.
  • Thread 1 writes to B for entity X.

I could uses a mutex across all threads, but I don't really want to block the operation for a different entity.

2
  • What databases? Why do writes of different entities need to be synchronized across threads? How are reads considered? Is thus purely a problem within a single process? Commented May 6, 2020 at 18:14
  • I updated the question to include the database details. Writes for different entities do not need to be synchronized. This is a process running on multiple servers and each process creates multiple threads via Hangfire jobs. Commented May 6, 2020 at 20:07

2 Answers 2

1

I would suggest using simple lock (if it is in one area of the code) As it would be processing different objects (meaning .net objects) but having the same value (as it is the same entity) I would rather go with some form of code for entities. If the entity has some form of code I would use it - for example:

But of course, you have to watch out for deadlocks. And String.Intern is tricky, as it Interns the string for as long as application runs.

lock(String.Intern(myEntity.Code)) { SaveToDatabaseA(myEntity); SaveToDatabaseB(myEntity); } 

But it looks like you want to have some kind of replication mechanism. Then I would rather do it on database level (not on code level)

[UPDATE]

You updated the question with information, that it is being done on multiple servers. And this information is kind a crucial here :) Normal lock wont work.

Of course, you can play with synchronizing the locks across different servers, but is like with distributed transactions. Theoretically speaking you can do it, but most of the persons just avoid it as long as they can, and they play with the architecture of the solution to simplify the process.

[UPDATE 2]

You may also find this interesting: Distributed locking in .NET

:)

Sign up to request clarification or add additional context in comments.

8 Comments

So this basically becomes a named lock that will only block other threads that are processing the same entity. Can you explain how this would be subject to deadlocks? The two writes aren't really about replication as one database is MongoDB and the other is SqlServer (see my edits to the question to include these details)
Avoid using string.Intern here: instances are never released from the String Pool. Using/abusing string.Intern like this is a way to make a long-running process over an unbound set of data become slower over time (string.Intern is much slower to update than a Dictionary as the pool size increases) and eventually crash when it runs out of memory.
It could if you would use String.Intern on other place in code with the same code. That is the risk. So maybe even better would be to add some "place in code" identifier, so that it would secure you from this risk. So then it should be like: String.Intern($"WritingEntities{myEntity.Code)")
@user2864740 i've stated that in the anwser, that it String.Intern string stays in memory. If your entities set is like 1000 or 10 000 - don't bother any issues. But if you would use this code on 100 000 or 1 000 000 - it can become a problem.
@JimSweeney - the fact that the app is running on multiple servers is like decisive factor here. Of course, that in such scenario you cannot rely on lock :D That is why, you have to describe your issue IN DETAILS, otherwise, someone will suggest you solution that don't fit your case;)
|
1

Using the lock statement is insufficient for multiple processes1. Even named/system semaphores are limited to a single-machine and thus insufficient from multiple servers.

If duplicate processing is OK and a "winner" can be selected, it may be sufficient just to write/update-over or use a flavor of optimistic concurrency. If stronger process-once concurrently guarantees need to be maintained, a global locking mechanism needs to be employed - SQL Server supports such a mechanism via sp_getapplock.

Likewise, the model can be updated so that each agent 'requests' the next unit of work such that dispatch can be centrally controlled and that an entity, based on ID etc., is only given to a single agent at a time for processing. Another option might be to use a Messaging system like RabbitMQ (or Kafka etc., fsvo); for RabbitMQ, one might even use Consistent Hashing to ensure (for the most part) that different consumers receive non-overlapping messages. The details differ based on implementation used.

Due to the different nature of a SQL RDBMS and MongoDB (especially if used as "a cache"), it may be sufficient to loosen the restriction and/or design the problem using MongoDB as a read through (which is a good way to use caches). This can mitigate the paired-write issue, although it does not prevent global concurrent processing of the same items.

1Even though a lock statement is globally insufficient, it can be still be employed locally between threads in a single process to reduce local contention and/or minimize global locking.


The answer below was for the original question, assuming a single process.

The "standard" method of avoiding working on the same object concurrently via multiple threads would be with a lock statement on the specific object. The lock is acquired on the object itself, such that lock(X) and lock(Y) are independent when !ReferenceEquals(X,Y).

The lock statement acquires the mutual-exclusion lock for a given object, executes a statement block, and then releases the lock. While a lock is held, the thread that holds the lock can again acquire and release the lock. Any other thread is blocked from acquiring the lock and waits until the lock is released.

lock (objectBeingSaved) { // This code execution is mutually-exclusive over a specific object.. // ..and independent (non-blocking) over different objects. Process(objectBeingSaved); } 

A local process lock does not necessarily translate into sufficient guarantees for databases access or when then the access spills across processes. The scope of the lock should also be considered: eg. should it cover all processing, only saving, or some other work unit?

To control what objects are being locked and reduce the chance of undesired/accidental lock interactions, it's sometimes recommend to add a field of the most specific visibility to the objects explicitly (and only for) the purpose of establishing a lock. This can also be used to group objects which should lock on each other, if such is a consideration.

It's also possible to use a locking pool, although such tends to be a more 'advanced' use-case with only specific applicability. Using pools also allows using semaphores (in even more specific use-cases) as well as a simple lock.

If there needs to be a lock per external ID, one approach is to integrate the entities being worked on with a pool, establishing locks across entities:

// Some lock pool. Variations of the strategy: // - Weak-value hash table // - Explicit acquire/release lock // - Explicit acquire/release from ctor and finalizer (or Dispose) var locks = CreateLockPool(); // When object is created, assign a lock object var entity = CreateEntity(); // Returns same lock object (instance) for the given ID, and a different // lock object (instance) for a different ID. etity.Lock = GetLock(locks, entity.ID); lock (entity.Lock) { // Mutually exclusive per whatever rules are to select the lock Process(entity); } 

Another variation is a localized pool, instead of carrying around a lock object per entity itself. It is conceptually the same model as above, just flipped outside-in. Here is a gist. YMMV.

private sealed class Locker { public int Count; } IDictionary<int, Locker> _locks = new Dictionary<int, Locker>(); void WithLockOnId(int id, Action action) { Locker locker; lock (_locks) { // The _locks might have lots of contention; the work // done inside is expected to be FAST in comparison to action(). if (!_locks.TryGetValue(id, out locker) locker = _locks[id] = new Locker(); ++locker.Count; } lock (locker) { // Runs mutually-exclusive by ID, as established per creation of // distinct lock objects. action(); } lock (_locks) { // Don't forget to take out the garbage.. // This would be better with try/finally, which is left as an exercise // to the reader, along with fixing any other minor errors. if (--_locks[id].Count == 0) _locks.Remove(id); } } // And then.. WithLockOnId(x.ID, () => Process(x)); 

Taking a sideways step, another approach is to 'shard' entities across thread/processing units. Thus each thread is guaranteed to never be processing the same entity as another thread: X,Y,Z always go to #1 and P,D,Q always to #2. (It's a little bit more complicated to optimize throughput..)

var threadIndex = entity.ID % NumThreads; QueueWorkOnThread(threadIndex, entity); // eg. add to List<ConcurrentQueue> 

4 Comments

Thanks for all the detail! A couple of things I should have mentioned - the two databases mentioned in my question are different engines, MongoDB and SqlServer. Also, the lock should not be on a particular object, but on a single entity represented by a different object in each thread. I updated my question with the database details.
If the process is running on multiple servers, can't "just" use lock (or even a semaphore; named semaphores are still per-system), although it can reduce local contention. However, the outside construct needs to be accounted for: ie. what happens if multiple servers write to the DB at the same time? Who wins? Does it matter / what happens on a conflict? Both of those databases do support concurrent connections (as in, the connection itself won't corrupt) as long as a different connection is used per thread. This still doesn't guarantee larger atomicity or mutually-exclusive processing.
Can you elaborate on "using MongoDB as a read through"? I have to write some data to MongoDB and some data to SqlServer, and it's ok for the last thread to "win" such that it writes last to both databases, but I need to avoid having a different winner for each database. Are you saying I can eliminate that possibility even with threads writing from two different servers?
With a read-through, whatever process reads from MongoDB is also responsible to providing a value if such does not exist: the core data is read-up (if needed) and cached in a different abstraction. Then the system of record becomes entirely the RDBMS with MongoDB providing a cache layer. (Such an approach might not always be possible, although I strongly recommend a single source of truth that can propagate outwards.)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.