Rate Limiting — Understanding Kong headers

5 min readSep 30, 2022

As API developers we need to ensure our services are reliable. A popular strategy is rate limiting, which constrains the number of requests that can be made to your service within a given time window. Social media applications, such as Instagram, restrict the number of activities a user can perform in a day. Activities could include sending messages, liking photos or commenting on a post. For example, if a user comments on an excessive number of posts, the rate limit kicks in and blocks the user from commenting on any more posts for a certain period of time. Without rate limiting the consequences can be catastrophic, ranging from reputational damage to financial loss. When creating a new application, it is important to implement rate limiting to protect your API, services and infrastructure. Rate Limiting is not a concept you want to have to think about in the core application logic, you just want to think about “business logic”, by using Kong Gateway this can abstract away from creating features to protect your services, giving you more time to focus on building the application.

Protecting your APIs, services and infrastructure can be achieved by using Kong API Gateway Rate Limiting plugin, this blog post covers understanding Kong API Gateway headers sent to the client.

Understanding Kong API Gateway Headers

The Kong API Gateway Rate Limiting Advanced plugin can be used to limit the number of HTTP requests that can be made within a certain time window, whether that is seconds, minutes, hours, days or even years. HTTP headers allow the client and the server to pass additional information as part of a HTTP request or response. When the Kong plugin is enabled, Kong injects headers into the response. These headers show the allowed rate limit, number of available requests, and the time remaining (in seconds) until the rate limit quota is reset. The client can use these headers to determine what to display to the customer if the rate limit has been exceeded. Let’s take a look with the example below:

Example Rate Limiting Configuration

{  "name": "voucherapi", . . .      "routes": [      {         "protocols": [            "http",            "https"       ],       "methods": [          "POST"       ],       "paths": [          "/voucher-redemption"      ]    }   ], . . .  {    "name": "rate-limiting-advanced",    "config": {    "strategy": "cluster",    "limit": [ 3 ],    "window_size": [ 300 ],    "sync_rate": -1,    "window_type": "fixed",    "identifier": "ip"  } }...}

In the abstract example above, using the vouchers API to redeem a voucher code, the rate limit is set to 3 requests per a fixed 5 minute window (300 seconds), per IP address. What do the headers sent to the client look like after the first request was made, at the start of the window?

RateLimit-Limit: 3 → How many requests are available

RateLimit-Remaining: 2 → Number of outstanding available requests

RateLimit-Reset: 299 → The number of seconds until the quota is reset

As you make further requests in the current window, the number of remaining requests will decrease. After the rate limit window has elapsed, the rate limit remaining will be reset back to 3. In addition to this, the plugin sends headers to show the time limit and the minutes still available. Let’s say a request was made within the last minute of a window, this would be:

X-RateLimit-Limit-Minute: 3 → The limits in the time frame

X-RateLimit-Remaining-Minute: 1 → Number of remaining minutes

By using these headers, we can configure the HTTP client to retry once the time window has elapsed. Without the time window, we would not know when to retry the rejected request. Having retries in the background can create a seamless user experience.

What happens when the rate limit is reached?

When the rate limit is exceeded, the Kong plugin returns a ‘Too Many Requests’ (429) status code, for example:

HTTP/1.1 429 Too Many Requests{ “message”: “API rate limit exceeded” }

Why use API Rate Limiting?

Generally, rate limiting is used as a defensive measure for your services. Within a microservice architecture, it is common for applications to depend on APIs from another team, or even a third party. Although companies have great control over APIs in a microservice architecture, hosting APIs and databases can put strain on resources. Shared services need to protect themselves from excessive use, whether intended or unintended, in order to maintain availability. Rate limiting can also help organisations to secure, monetise and govern services.

To ensure fair use, rate limiting APIs prevents some users from draining resources, such as APIs and databases. Using a food delivery application as an example, during peak periods the number of customers will quickly spike, therefore it is essential to ensure all services are available and not overloaded, so orders can be processed.

Rate limiting can mitigate malicious overuse. Brute force attacks can result in data being stolen, this can lead to loss in trust and reputational damage. Denial of Service (DoS) and Distributed Denial of Service (DDoS) attacks can have serious consequences. If there is a server outage, it can be stressful to try to bring these resources back online. Genuine customers may not be able to use your application as it may be slow or even unresponsive. If customers can not carry out time critical tasks, this could drive your customers to competitors. Overall, all these consequences will result in financial loss.

Advanced Rate Limiting Strategies

There are various strategies that can be applied using the Kong Rate Limiting Advanced plugin.

The Fixed Window strategy assigns each incoming request with a counter that gets incremented within the time window, typically in seconds. This is the most memory efficient strategy. Although the main drawback is it allows bursts within each window, For example; if a rate limit is set to 100 requests per 60 seconds, a burst of requests could be made in the last few seconds of the window, and then again at the start of the next window.

The Sliding Window strategy takes into account a weighted value of the previous window when calculating the current rate limit. As mentioned above, with the Fixed Window bursts can happen at the windows edge, with Sliding Window this is resolved by taking the previous counter into account. Kong recommends the sliding window approach, as it gives flexibility to scale rate limiting with good performance. Therefore, it is the default strategy.

Summary

Making use of rate limiting is essential to deliver a reliable and seamless experience to customers. Using rate limiting, Kong will return a 429 status code and not send the request to your service, protecting your API, services and infrastructure. If a client receives a 429 status code, they are indicated to back-off, allowing your server time to recover. Join other companies such as Just Eat Takeaway.com, Papa John’s, Samsung and Mastercard to take advantage of Kong API Gateway, you can read about interesting case studies here.

Rate Limiting — Understanding Kong headers

Written by Danielle