0

I have a Web API in C# that receives data from various clients, which needs to be saved to the database. Each client sends data approximately every 10 seconds. To save this data, I need to base it on some properties of the last record saved in the database. This means that each time I save a new record, I need to perform a query to fetch the last saved record.

The problem occurs when I receive multiple data points from the same client simultaneously. For example, sometimes I might receive 50 data points from the same client within the same second. When I perform the query to fetch the last record, that record might no longer be the last one since multiple data points are being inserted at the same time (the data is saved and retrieved asynchronously). When an invalid data point is saved, it compromises all subsequent records that are based on that invalid data.

My question is: what practice can I adopt to solve this problem? I thought about saving and retrieving all data synchronously, but I believe this would harm the performance of my application."

2
  • Is this a distributed application? That is, do you have multiple host instances talking to the same DB? Commented Jul 31, 2024 at 18:21
  • No, it is a "single" application, I am not using anything like microsservices or something else Commented Jul 31, 2024 at 18:23

5 Answers 5

7

Eliminate the last record query entirely and stop your database first thinking.

Add a queue to your solution. incoming data is added to the queue asynchronously from multiple sources. A second process removes items from the queue in a FIFO fashion and puts them on the database.

This second process always knows what the last record inserted was, because it inserted it and can do the calculation in memory without having to query the db to see where it got to.

It doesn't have the problem of multiple chunks of data coming in at the same time, because its reading from a queue so the ordering has been done.

Your database operations are now pure inserts and your performance will improve dramatically.

Just a note on the whole distributed thing:

The key thing to realise is that you have a requirement for a single process operation. The ordering operation requires it.

You can do it on the database with locks and transactions, or in message queue system by letting it decide which event to write first, or you could defer the ordering and do the calculation on read, or on some eventual consistency process.

Maybe, in certain conditions you can use some clever maths to make a natural order. But it's not the general case and here I think the need for the actual last record precludes such solutions.

By explicitly doing this single process bit in your own code, you can optimise it for your particular case. Here by keeping the last item in memory rather than having to fetch it.

But just because you need a single thread at one point in your solution doesn't mean the whole thing has to be single threaded.

11
  • 1
    This assumes a single process reading from the queue at a time. There are lots of reasons that can be problematic. Commented Jul 31, 2024 at 18:13
  • Given the clarification, this is a good solution. Consider using a queue per client or something like message attributes to allow for concurrent processing of multiple client requests. Commented Jul 31, 2024 at 18:29
  • it's not so much assuming as specifying. If you split the application up you can control where you need to be synchronous and asynchronous Commented Jul 31, 2024 at 18:56
  • If you have multiple processors on a queue, FIFO isn't guaranteed (or even expected), at least with the various messaging platforms I've worked with. That requires some sort of distributed semaphore or other enforcement of order. Generally, not worth the problems that come with that, in my experience. Easier to deal with at the DB. Commented Jul 31, 2024 at 19:11
  • sure theres no free lunch, ive updated my answer to go into it a bit Commented Jul 31, 2024 at 19:15
0

Last according to what?

I like Ewans suggestion of a local queue. It will work so long as you’re reading and writing from the same thread.

Things get weird if you’re not. Because queue.last() can change when you’re not looking. Or even halfway through reading it.

The simple fix here is read from the queue atomically. That can be as easy as making sure you save the reference / pointer / address to the “last” only once. You can use that multiple times but don’t look it up again halfway through working with it. Don’t expect it to be the same next time.

1
  • mm that's not what i meant, queue.last has the same problems as querying the db, I mean you just keep an object reference to the last item you added to the db so you can do the calc on the next object you pull from the queue. Commented Jul 31, 2024 at 21:43
0

A database only approach would be to insert the incoming data into a staging table. That way, many clients can insert at the same time.

Then you have a single process to merge the data based on the last record on the target table, processing the staging table in order. One could use a sequence to order the staging records. Once the individual data records are merged, those record(s) can be removed from the staging table. Since only 1 process is making update, it always has the last/latest change because it exclusively is making updates.

Many processes inserting, one process to merge. Shouldn't have to keep querying the target table as it can keep the changes in memory, so the target table should just be an update with no prior query.

-1

Since you need to keep the order of records for each client, you probably need to handle all requests for a given client in one API server instance (otherwise the notion of "last record" does not make sense).

How you achieve this is a matter of your sharding/load balancer/whatever.

Once that is out of the way, you simple cache the last inserted Record for each client and use that. No need to look up in the database, unless you restarted the server and the cache is empty.

2
  • 1
    This assumes a single API host instance, right? Or are you talking about a distributed cache? Commented Jul 31, 2024 at 14:55
  • Given the clarification, this is workable with proper thread synchronization of the cache state. Commented Jul 31, 2024 at 18:32
-1

Note: This answer was written assuming a distributed system with multiple hosts interacting with the same DB. While this is overkill for the OP's problem, I will leave it for other's who may need to solve for that scenario. That said, a uniqueness constraint on the DB explained is a good way to prevent corruption in case the ordering solution were to have a bug in it.

This is a good case for an optimistic locking scheme. Here's one approach that might work for you:

Keep a version number on each record which you increment by one on each new record. Create a unique index on the version number. When a new record comes in, you look up the last record. Add one to that and use it as the version number of your new record.

If another update has come in between retrieving the last record and adding the new record, both competing updates will retrieve the same last record and attempt to insert a new record with the same incremented version number. One of these inserts will fail due to the uniqueness constraint.

You need to figure out what should happen next. One option is to try again on the failed insert. That is, trap that specific error and then loop back to the start of the routine from the point of retrieving the last record. Note that this approach is not strictly guaranteed to ever complete. In practice, as long as there are gaps between 'bursts' of updates that are long enough to add all the new records one-by-one, it will. Another option is to fail the update back to the client. It all depends on the overall requirements.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.