Core API Concepts
SmagLink differentiates two types of usage restrictions:
A monthly cap in fair use, meaning that the client application may exceed this limit, but the APIs will keep answering to further requests;
A rate (defined in calls count and/or bandwidth) in a much shorter time span (one minute or less) will trigger a throttling mechanism. Said rate may be adjusted depending on the time of day, use cases, health … to maintain a high service quality for client applications, even when under the stress of a spike in consumer calls, or a malicious attack.
This documentation aims to describe SMAG’s technical recommendations to implement systemic and resilient interactions with our APIs gateway, specifically regarding the second restriction: throttling.
What happens when my application reaches the maximum rate per minute and the throttling mechanism fires?
An error code in the response is sent by the gateway. Its standardized purpose, as defined by HTTP/1.x and HTTP/2 (rfc6585), is the following:
429 Too Many Requests
This code indicates the user has sent too many requests in a short-given time. The body contains details in natural language for the developer, and more importantly, the “Retry-After” property expressed in milliseconds.
HTTP standards specify that there must be no content related to the user, its identity, or the rules that triggered the restrictions.
If you use a caching mechanism with your requests, take precautions to filter those responses.
The throttling is causing exceptions in my application. How can I handle it?
The purpose of this throttling is not to prevent you from calling the APIs, but to allow you to dynamically delay those calls when “bursting”. There is no need to spend time writing your own solution for it, as many resilience and transient-fault-handling library exists for most – if not all – languages and technologies web-related.
Many of those can support 429 code out of the box or with very little configuration. At the end of this documentation, we share a very simple implementation with Polly, in .NET Core 3.
Java: Retry4j, Resilience4j
Keep in mind that it is always a good idea to use such a library when your application relies heavily on a cloud-based solution. Not just for this specific function.
It is recommended to add “jitter” to your retry-after timer. This allows your application to avoid high bursts of simultaneous requests when the timer expires. Those bursts may not only degrade the response speed, it will also make you reach the threshold faster. Jitter, preferably in correlation with other requests waiting to be retried, is a powerful and easy to implement way to smooth out your call rate. Learn more.
I reach the call rate limit too quickly/too often and it is impacting my users’ experience. What can I do?
If the above answer did not help, we are always available to discuss your needs. Please contact us at firstname.lastname@example.org.
.NET Core 3 Web API & Polly – Implementing 429 error resilience
Polly is a .NET library that provides resilience and transient-fault handling capabilities. You can implement those capabilities by applying Polly policies such as Retry, Circuit Breaker, Bulkhead Isolation, Timeout, and Fallback. Polly targets .NET Framework 4.x and .NET Standard 1.0, 1.1, and 2.0 (which supports .NET Core).
A .Net Core 3 Web API project
The Polly policy – wait and retry – is defined as follow:
internal static AsyncRetryPolicy<HttpResponseMessage> _handleTooManyRequestAsyncPolicy = Policy
.HandleResult<HttpResponseMessage>(e => e.StatusCode == HttpStatusCode.TooManyRequests) // code 429
retryCount: 3, // number of time to retry this policy
sleepDurationProvider: (i, e, ctx) => // time span to wait until retry
// extract the retry-after property from the response header
onRetryAsync: (e, ts, i, ctx) => Task.CompletedTask
A named http client is configured in the services. We use Polly extensions to add the policy.
// Configure a named client for FARMS-AND-FIELDS
Now all we need to do is to inject the http client where we require it and start making requests.
In this simple example we are just fetching the farms, once, and outputting the result as strings of characters. At this point you increase your parallel threads calls to reach the throttle limit and observe as your calls space out dynamically.