How to Avoid API Overload: Few Effective Throttling Strategies

10 Sep 2023

The majority of applications that we develop today are dependent on APIs - either internal or external ones. The internal APIs are often liberal with the request limits but the external APIs are very strict about the limits.

In this article, we will look into a couple of different strategies to throttle and make sure your application is within limits.

Why Throttle?

There are a few strong reasons why throttling should be considered during your development. Here are a few cases mentioned below

  • You make parallel requests and block the threads waiting for external resources, then you are introducing Thread starvation in your application.
    • Symptom: CPU is available but the application just gets struck in a loading state (sounds familiar😀). You can see this reported as "Slow Execution" in the Azure portal.
  • You make parallel requests and don't block the threads. It sounds fine? Nah! If you keep on sending outgoing requests you'll end up exhausting all the outgoing TCP sockets in your machine/instance.
    • Happens a lot in Azure where the app service sandbox keeps strict limits for lower tiers.
  • You make parallel requests at a rate at which target resources couldn't be handled. This leads to the choking of that target resource which in turn, drags your application down as well.
    • This situation usually happens in friendlier internal APIs that don't do any rate limiting.
  •  You make parallel requests at a rate that the external resource couldn't keep up or Agree. Then you are massively rate-limited as well as blocked for a few minutes, which could result in crippling your application's functionality randomly.

So, uncontrolled spawning of tasks simply leads to problems either for you or for others. Let us delve into different strategies to control this situation

1. Simple throttle

This strategy is simple and most suggested throughout the web for throttling parallel tasks. It just considers the current Thread and makes sure the parallel tasks started from the current thread are within a specified concurrency limit.

1.1 Throttling using Task.WhenAny()

// Suggested by Stephen Toub in Task-based Asynchronous Pattern
const int CONCURRENCY_LEVEL = 15;
Uri [] urls = …;
int nextIndex = 0;
var imageTasks = new List<Task<Bitmap>>();
while(nextIndex < CONCURRENCY_LEVEL && nextIndex < urls.Length)
{
    imageTasks.Add(GetBitmapAsync(urls[nextIndex]));
    nextIndex++;
}

while(imageTasks.Count > 0)
{
    try
    {
        Task<Bitmap> imageTask = await Task.WhenAny(imageTasks);
        imageTasks.Remove(imageTask);

        Bitmap image = await imageTask;
        panel.AddImage(image);
    }
    catch(Exception exc) { Log(exc); }

    if (nextIndex < urls.Length)
    {
        imageTasks.Add(GetBitmapAsync(urls[nextIndex]));
        nextIndex++;
    }
}

1.2 Throttling using Semaphore Slim

static async Task DoSomething(int n);

static void RunConcurrently(int total, int throttle) 
{
  var mutex = new SemaphoreSlim(throttle);
  var tasks = Enumerable.Range(0, total).Select(async item =>
  {
    await mutex.WaitAsync();
    try { await DoSomething(item); }
    finally { mutex.Release(); }
  });
  Task.WhenAll(tasks).Wait();
}

Note: You can move the mutex object as a static variable outside of the method and perform throttling at the application level rather than at the individual request level.

1.3 Throttling using TPL Dataflow

async Task RunAsync(int totalThreads, int throttle) 
{
    var block = new ActionBlock<int>(
        DoSomething,
        new ExecutionDataFlowOptions { MaxDegreeOfParallelism = throttle });

    for (var n = 0; n < totalThreads; n++)
    {
        block.Post(n);
    }

    block.Complete();
    await block.Completion;
}

And you can have other ways to do similar concurrency control. This approach makes sure that the number of parallel tasks that are executed is controlled so the host machine resources are not exhausted.

Cons

  • This strategy doesn't work well for web farms or environments where you run your application from multiple machines. So, the number of parallel requests remains high and has a chance to bring down the target machine or get itself into rate-limiting hell.
  • If we aggressively do the throttling or concurrency limiting, then we may end up underutilising the target resource.

2. Throttling max requests that are sent but don't wait for completion

Most of the external APIs limit you based on requests/minute or requests/second. e.g., Azure storage account limits 2000 requests/second.

If you set one of your threads to not exceed this concurrency limit using a simple throttling strategy then you are underutilizing the capacity. because in the next second, whether the first set of actions is completed or not, you can still send 2000 more without being rate-limited. 

The following code allows you to follow the rate-limiting policy of the external system and keep your requests in sync with that target.

using System;
using System.Threading;
using System.Threading.Tasks;

namespace MyNamespace
{
    /// <summary>
    ///     ThrottleService allows you to control the maximum actions allowed within a given period.,
    ///     After lock-in period, the throttle is released even if the previous actions are still executing, and the next set
    ///     of actions can be performed.
    /// </summary>
    public class ThrottleService : IThrottleService
    {
        private readonly TimeSpan _maxPeriod;
        private readonly SemaphoreSlim _throttleActions, _throttlePeriods;

        public int ActionThrottleCount => _throttleActions.CurrentCount;
        public int PeriodThrottleCount => _throttlePeriods.CurrentCount;

        public ThrottleService(int maxActions, TimeSpan maxPeriod)
        {
            _throttleActions = new SemaphoreSlim(maxActions, maxActions);
            _throttlePeriods = new SemaphoreSlim(maxActions, maxActions);
            _maxPeriod = maxPeriod;
        }

        protected ThrottleService(ThrottleSettings settings) : this(settings.MaxParallelRequests, settings.ThrottleDuration)
        {
        }

        /// <summary>
        ///     Queues the specified action.
        /// </summary>
        /// <typeparam name="T"></typeparam>
        /// <param name="action">The action.</param>
        /// <returns></returns>
        public async Task<T> Queue<T>(Func<T> action)
        {
            return await Queue(action, CancellationToken.None);
        }

        /// <summary>
        ///     Queues the specified action.
        /// </summary>
        /// <typeparam name="T"></typeparam>
        /// <param name="action">The action.</param>
        /// <param name="cancel">The cancel.</param>
        /// <returns></returns>
        public async Task<T> Queue<T>(Func<T> action, CancellationToken cancel)
        {
            await _throttleActions.WaitAsync(cancel);

            try
            {
                await _throttlePeriods.WaitAsync(cancel);

                // Release after period [Note: Intentionally not awaited]
                // - Allow bursts up to maxActions requests at once
                // - Do not allow more than maxActions requests per period
                _ = Task.Delay(_maxPeriod).ContinueWith(tt => { _throttlePeriods.Release(1); }, cancel);

                return action();
            }
            finally
            {
                _throttleActions.Release();
            }
        }
    }
}

// sample usage
private static readonly ThrottleService Throttle = new ThrottleService(StorageMaxParallelRequests, StorageThrottleDurationMs);
var task = await Throttle.Queue(async () =>
{
    await sourceService.CopyDirectoryItems(source, target, destinationService).ConfigureAwait(false);
}).ConfigureAwait(false);
await task.ConfigureAwait(false);

You can set maxActions as '10' and maxPeriod as '100'. This would allow 10 requests to be sent to the target request for every 100 milliseconds.

Note: You can make this Throttle Service instance Singleton and implement it as an application-level throttle for specific outgoing calls e.g., calls to storage or external service like pdf generations, video processing or Google APIs etc..,

Since there are chances of overall requests from multiple instances/machines to the target resource going above the limit, it is better to implement

  1. Exponential retries using Polly when you hit those "too many requests" errors and avoid request stampede.
  2. Sniff the retry-after header from the target response and follow that to improve successful request completion.

3. Distributed lock-based throttling

In most cases, you will be fine adapting any one of the above two strategies.

If there is a system that will penalize the API/Resource consumer strictly by blocking them for a considerable amount of time, then retrying and creating more requests is going to extend our lockout time. So in this case, we need to make sure that the overall request count per minute doesn't cross the strict limit set by the resources.

For, If an API allows 500 requests per minute using an API key and any abuse to the system by going above that limit will block the consumer for 1 hour. The penalty for crossing the threshold is severe.

In this case, we can use a distributed locking system by maintaining the current count in a common store or in-memory system like Redis.

using System;
using StackExchange.Redis;

// Connection multiplexer to Redis
ConnectionMultiplexer redis = ConnectionMultiplexer.Connect("localhost:6379");
// Redis database
IDatabase db = redis.GetDatabase();

    public void ProcessInBatch()
    {
        // Define the maximum allowed requests
        int maxRequests = 100;
        // Define the counter key
        string counterKey = "request_counter";
        // Increment the counter and check the limit
        long currentCount = db.StringIncrement(counterKey);
        if (currentCount > maxRequests)
        {
            Console.WriteLine("Request limit exceeded. Please try again later.");
            RetryAfterSomeTime();
        }
        else
        {
            Console.WriteLine("Request accepted. Processing...");
            // Process the request
        }
        db.StringDecrement(counterKey);
        // Close the connection to Redis (usually done at the end of your application)
        redis.Close();
    }

Since we have a distributed lock that keeps track of the current concurrency limit. We can retry with confidence when a request is rejected after a wait time.

Note: Please make sure to use a very fast store that doesn't lock due to parallel requests to avoid any slowness due to locking and unlocking complexities. e.g., using a database as a lock store will eventually make your application queue up the parallel requests to update its counter.

4. Queue-based load levelling pattern

If you are working on a system that can perform these massive API requests on the background worker and update the results back to the clients. Then it is best to use a queue-based load leveling pattern.

The parallel requests that are required to be sent to external requests will be added to a queue system. Then a job that reads through this queue and processes the number of queue items based on the comfortable rate and sends requests to external resources/API at a steady rate.

Here is a very good article from Microsoft about this pattern Queue-Based Load Leveling pattern - Azure Architecture Center | Microsoft Learn

Throttling your API requests is a good practice to avoid performance issues and ensure the quality of your application. Depending on your scenario, you can choose from different strategies to control the number and rate of your requests, such as simple throttle, throttling max requests without waiting, distributed lock-based throttling, or queue-based load levelling pattern. I hope this article has helped you find the best solution for your project. If you have any questions or feedback, please leave a comment below.

Happy hacking!