Why is distributed throttling unbalanced?

This article was last updated on: February 7, 2024 pm

overview

In today’s microservices, APIs, and cloud native, service governance is indispensable, and traffic limiting is almost an indispensable means in service governance. Microserviceization is often accompanied by a distributed architecture, so it is not enough to limit the flow of a single machine, but also to limit the flow of distributed machines.
Then the question arises: in distributed current limiting, there is often a situation of “unbalanced current limiting” or “current limiting error”, why is this?

Current limiting

During the National Day holiday, the word flow restriction should be frequently heard in the news, that is, “scenic area flow limit”. Here are two attractions in Wuxi as examples:

📌Example:

Wuxi Liyuan: the maximum carrying capacity is adjusted to 20,000 people; The instantaneous maximum load capacity is adjusted to 4000 people;

Wuxi Donglin Academy: The maximum capacity of the college reception day is immediately reduced to 1500 people, and the instantaneous capacity is reduced to 300 people.

In a computer network, current limiting is used to control the rate at which a network interface controller sends or receives requests <sup id=“fnref:1” class=“footnote-ref”>[1], which extends to:Limit the number of concurrent requests reaching the systemto ensure the stability of the system (especially on microservices, API, and cloud native systems).

Common throttling algorithms

Fixed window counter
Sliding window counter
Leaky barrels
Token bucket

Stand-alone and distributed current limiting

In essence, the difference between stand-alone current limiting and distributed current limiting lies in the location of “carrying capacity” storage.

Stand-alone throttling is directly implemented on a single server, while on microservices, APIs, and cloud-native systems, applications and services are deployed in clusters, so multiple instances in the cluster need to work together to provide cluster-wide throttling, which is distributed throttling.

🤔 Why is distributed throttling unbalanced?

For example, the sliding window algorithm mentioned above can store counters in a KV database such as Redis.
For example, a sliding window for each requested time record can be leveraged by Redis zset Store, utilize ZREMRANGEBYSCORE Delete data outside the time window and reuse ZCARD Count.

The sample code <sup id=“fnref:2” class=“footnote-ref”>[2] is as follows:

package com.lizba.redis.limit;

import redis.clients.jedis.Jedis;
import redis.clients.jedis.Pipeline;
import redis.clients.jedis.Response;

/**
 * <p>
 *     Limiting current by sliding window algorithm through zset
 * </p>
 *
 * @Author: Liziba
 * @Date: 2021/9/6 18:11
 */
public class SimpleSlidingWindowByZSet {

    private Jedis jedis;

    public SimpleSlidingWindowByZSet(Jedis jedis) {
        this.jedis = jedis;
    }

    /**
     * Judging whether an action is allowed
     *
     * @param userId        User id
     * @param actionKey     Behavior key
     * @param period        Current Limiting Cycle
     * @param maxCount      Maximum number of requests (sliding window size)
     * @return
     */
    public boolean isActionAllowed(String userId, String actionKey, int period, int maxCount) {
        String key = this.key(userId, actionKey);
        long ts = System.currentTimeMillis();
        Pipeline pipe = jedis.pipelined();
        pipe.multi();
        pipe.zadd(key, ts, String.valueOf(ts));
        // Remove data other than sliding windows
        pipe.zremrangeByScore(key, 0, ts - (period * 1000));
        Response<Long> count = pipe.zcard(key);
        // Set the expiration time of the behavior, and if the data is cold, zset will be deleted to save memory space
        pipe.expire(key, period);
        pipe.exec();
        pipe.close();
        return count.get() <= maxCount;
    }


    /**
     * Current limiting key
     *
     * @param userId
     * @param actionKey
     * @return
     */
    public String key(String userId, String actionKey) {
        return String.format("limit:%s:%s", userId, actionKey);
    }

}

Like a token bucket, you can also put the number of tokens into Redis.

🧠 Answer 1: Errors caused by batches

However, the above method is equivalent to each request needs to go to Redis to judge whether it can be passed, and there is a certain loss in performance, so for large concurrent systems, there is an optimization point is “batch”. For example, each time the token is not taken, but a batch, and if it is not enough, it will take a batch. This reduces requests to Redis.
ButBatch acquisition will lead to a certain range of current limiting errors。 For example, if instance A takes 100 at the moment and waits for the next second to use it, the total cluster load may exceed the threshold in the next second.

This is a reason.

🧠 Answer 2: Load balancing is uneven

Another method of distributed throttling is “equal sharing”, such as the previous stand-alone limit of 100, and now the cluster has deployed 5 instances, then let each continue to limit the flow by 100, that is, do the total traffic limit at the total entrance, such as 500, and then each instance will achieve the current limit by itself.
In this case, assuming that the total ingress is 500 requests, these requests need to go through the load balancing algorithm (such as round-robin, minimum number of connections, minimum connection time, etc.) and session persistence policies (such as source address persistence, cookie persistence, or hashing of specific parameters), and the requests assigned to each may be unbalanced, such as 70 in instance A and 130 in instance B. Then 70 of instance A will pass, while 130 of instance B may only pass 100. At this time, there is a situation of “unbalanced current limiting” or “current limiting deviation”.

This is the second reason.

summary

Due to my own experience, this article only lists 2 answers that I can think of so far for your reference, welcome to exchange and supplement.
The real business scenario is very complex, and there are many conditions and resources that need to be considered for flow limiting. All we have to do is approximate the ideal situation by estimating, stress-testing, commissioning, adjusting, re-production, validating, and adjusting.