How to improve a monthly usage quota system for an API with growing number of users?

Question

Couple of years ago, I designed an API service which has a limit on the number of requests per month per user. To track the monthly usage of each user, I used a table called monthly_usage. The table structure is as follows:

Column name	Type	Description
id	integer	A unique identifier for the user
current_period_start	timestamp	The start time of the current monthly usage period
current_period_end	timestamp	The end time of the current monthly usage period
total_requests	integer	The total number of requests made by the user in the period

Whenever a request is served, an event is fired and the user details are used to increment a counter in Redis called current_count. Every 10 minutes, a schedule called usage_collection_job is executed, which performs the following steps:

Collect the current_count value for each user from Redis.
Add the current_count value to the corresponding total_requests value in the monthly_usage table.
Reset current_count in Redis to zero.
Store the updated total_requests value in Redis under the key current_total.

Whenever a new request is received, a UsageFilter queries the current_total from Redis and verifies that the user hasn't exceeded their quota. If the user has exceeded their quota, the filter rejects the request. At the end of each month (or when the subscription renews), the monthly_usage for each user is reset to zero.

This system has been working well, but as the number of users is growing (over 10000 in last few months alone), I'm facing the following issues:

The usage_collection_job is taking more and more time to complete execution, which is affecting the API performance.
Since this job runs for all users at the same time, it causes high CPU usage and spikes in Redis usage.
Since the current_total is updated only every 10 minutes, a user can exceed their quota and still send thousands of requests before the usage limit is enforced.

I'm looking for suggestions to improve the current system and overcome these limitations. Specifically, I'm interested in:

Ways to optimize the usage_collection_job to improve performance.
Alternatives to the current system that can handle a growing number of users and requests more efficiently.
Ways to enforce the usage limit more frequently or in real-time to prevent users from exceeding their quota.

Any advice or suggestions would be greatly appreciated. Thank you!

Ewan · Accepted Answer · 2023-03-23 14:55:20Z

Work out the count of requests from a time indexed log of requests instead of storing a number in reddis. (elastic etc)

Now you are just writing to the log for each request.
Move your usage collection job to work off the log instead of the live value.

This removes any impact on the API from running the job.
Only check users that are near their limit

Have the collection job mark users that are X away from hitting their limit, or be clever and predict for their usage when they are likely to run out. Mark these users to check and have a more frequent collection job that updates their status more frequently.
Dont query the current total in the API.

Instead, prevent the user logging in/refreshing their token when they have exceeded their limit. This removes the performance impact of checking the user data on each call.

If its critical to lock users out immediately you could have a 'hot list' of active users that have just been locked out in a hashmap or similar and check that. Hopefully the number will be small as they will have to have gone over their limit less than token expiration time ago.

Stack Exchange Network

How to improve a monthly usage quota system for an API with growing number of users?

1 Answer 1

Hot Network Questions

How to improve a monthly usage quota system for an API with growing number of users?

1 Answer 1

Related

Hot Network Questions