0

Couple of years ago, I designed an API service which has a limit on the number of requests per month per user. To track the monthly usage of each user, I used a table called monthly_usage. The table structure is as follows:

Column name Type Description
id integer A unique identifier for the user
current_period_start timestamp The start time of the current monthly usage period
current_period_end timestamp The end time of the current monthly usage period
total_requests integer The total number of requests made by the user in the period

Whenever a request is served, an event is fired and the user details are used to increment a counter in Redis called current_count. Every 10 minutes, a schedule called usage_collection_job is executed, which performs the following steps:

  1. Collect the current_count value for each user from Redis.
  2. Add the current_count value to the corresponding total_requests value in the monthly_usage table.
  3. Reset current_count in Redis to zero.
  4. Store the updated total_requests value in Redis under the key current_total.

Whenever a new request is received, a UsageFilter queries the current_total from Redis and verifies that the user hasn't exceeded their quota. If the user has exceeded their quota, the filter rejects the request. At the end of each month (or when the subscription renews), the monthly_usage for each user is reset to zero.

This system has been working well, but as the number of users is growing (over 10000 in last few months alone), I'm facing the following issues:

  • The usage_collection_job is taking more and more time to complete execution, which is affecting the API performance.
  • Since this job runs for all users at the same time, it causes high CPU usage and spikes in Redis usage.
  • Since the current_total is updated only every 10 minutes, a user can exceed their quota and still send thousands of requests before the usage limit is enforced.

I'm looking for suggestions to improve the current system and overcome these limitations. Specifically, I'm interested in:

  • Ways to optimize the usage_collection_job to improve performance.
  • Alternatives to the current system that can handle a growing number of users and requests more efficiently.
  • Ways to enforce the usage limit more frequently or in real-time to prevent users from exceeding their quota.

Any advice or suggestions would be greatly appreciated. Thank you!

1 Answer 1

1
  1. Work out the count of requests from a time indexed log of requests instead of storing a number in reddis. (elastic etc)

    Now you are just writing to the log for each request.

  2. Move your usage collection job to work off the log instead of the live value.

    This removes any impact on the API from running the job.

  3. Only check users that are near their limit

    Have the collection job mark users that are X away from hitting their limit, or be clever and predict for their usage when they are likely to run out. Mark these users to check and have a more frequent collection job that updates their status more frequently.

  4. Dont query the current total in the API.

    Instead, prevent the user logging in/refreshing their token when they have exceeded their limit. This removes the performance impact of checking the user data on each call.

    If its critical to lock users out immediately you could have a 'hot list' of active users that have just been locked out in a hashmap or similar and check that. Hopefully the number will be small as they will have to have gone over their limit less than token expiration time ago.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.