Load Balancers: Distributing Traffic Across Servers
How load balancers distribute incoming requests across multiple servers using strategies like round robin, least connections, and consistent hashing, and why the choice between L4 and L7 matters.
Terminology
- Load balancer: a device or software component that distributes incoming network requests across a pool of backend servers to prevent any single server from becoming overwhelmed
- Backend server (upstream): one of the servers in the pool that actually processes client requests; also called an origin or upstream server
- Health check: a periodic probe (HTTP request, TCP connection, or ping) sent by the load balancer to each backend to verify it is alive and able to serve traffic
- Round robin: a scheduling algorithm that cycles through servers in order, sending each new request to the next server in the list
- Weighted round robin: a variant of round robin where servers with higher capacity receive a proportionally larger share of requests
- Least connections: a scheduling algorithm that routes each new request to the server currently handling the fewest active connections
- Consistent hashing: a hashing technique that maps both servers and requests onto a ring, minimizing remapping when servers are added or removed
- Hash ring: a circular number space (typically $0$ to $2^{32}-1$) used in consistent hashing where both server identifiers and request keys are placed
- Virtual node: a replica of a physical server placed at a different position on the hash ring to improve load distribution uniformity
- L4 load balancer: a load balancer operating at the transport layer (TCP/UDP); it routes based on IP addresses and port numbers without inspecting application data
- L7 load balancer: a load balancer operating at the application layer (HTTP/HTTPS); it can inspect headers, URLs, cookies, and message bodies to make routing decisions
- Sticky session (session affinity): a configuration where the load balancer routes all requests from the same client to the same backend server, typically using cookies or IP hashing
- TLS termination: the process of decrypting HTTPS traffic at the load balancer so that backend servers receive plain HTTP, offloading cryptographic work
- Failover: the automatic redirection of traffic away from a failed server to healthy servers in the pool
- Throughput: the number of requests a system can handle per unit of time, often measured in requests per second (RPS)
What & Why
Every popular web service eventually outgrows a single server. When one machine cannot handle all incoming requests, you add more servers. But clients only know one address. Something needs to sit between clients and servers, accept every incoming request, and decide which backend should handle it. That something is a load balancer.
A load balancer solves three problems at once. First, it increases capacity by spreading requests across multiple servers so no single machine is a bottleneck. Second, it improves availability: if one server crashes, the load balancer detects the failure through health checks and stops sending traffic to it, so clients never see an error. Third, it enables maintenance without downtime. You can take a server offline, update it, and bring it back while the others continue serving traffic.
The core question in load balancing is the routing strategy: how does the balancer choose which server gets the next request? Simple strategies like round robin work well when all servers are identical and requests are roughly equal in cost. More sophisticated strategies like least connections or consistent hashing handle uneven workloads and stateful services. The choice depends on your traffic patterns, whether requests carry session state, and how your backend is architected.
Load balancers also differ in the network layer at which they operate. An L4 balancer works at the TCP/UDP level and is extremely fast but blind to application content. An L7 balancer understands HTTP and can route based on URL paths, headers, or cookies, but it does more work per request. Most production systems use both: an L4 balancer at the edge for raw throughput, and L7 balancers behind it for intelligent routing.
How It Works
Round Robin
The simplest strategy. The load balancer maintains a list of healthy servers and a counter. Each new request goes to the next server in the list, wrapping around to the beginning after the last server.
Weighted round robin extends this by assigning each server a weight proportional to its capacity. A server with weight 3 receives three times as many requests as a server with weight 1.
Least Connections
The load balancer tracks how many active connections each server currently holds. Each new request goes to the server with the fewest active connections. This naturally adapts to servers with different processing speeds: a fast server finishes requests quickly, its connection count drops, and it receives more work.
Consistent Hashing
Both servers and request keys are hashed onto a circular space (the hash ring) of size $2^{32}$. To route a request, hash its key and walk clockwise around the ring until you hit a server. When a server is added or removed, only the requests that map to the arc between the new/removed server and its predecessor are affected. With $n$ servers, adding or removing one remaps only $\approx 1/n$ of the keys.
Virtual nodes improve uniformity. Each physical server is placed at $v$ positions on the ring. With $v = 150$ virtual nodes per server, the load variance drops to roughly 5-10% even with a small number of physical servers.
L4 vs L7 Load Balancing
An L4 load balancer sees only the TCP or UDP packet headers: source/destination IP and port. It makes a routing decision based on these four values, then forwards the raw packets (or rewrites addresses via NAT). Because it never parses the application payload, it is extremely fast and can handle millions of connections per second. The trade-off is that it cannot distinguish between an API call and a static file request.
An L7 load balancer terminates the TCP connection, parses the HTTP request, and can inspect the URL path, Host header, cookies, query parameters, and even the request body. This enables content-based routing: send /api/* requests to the API server pool, /static/* to a CDN origin, and WebSocket upgrades to a dedicated pool. L7 balancers can also perform TLS termination, request rewriting, rate limiting, and authentication. The cost is higher latency per request and lower maximum throughput compared to L4.
Complexity Analysis
Let $n$ be the number of backend servers and $v$ be the number of virtual nodes per server in consistent hashing.
| Algorithm | Selection Time | Add/Remove Server | Space |
|---|---|---|---|
| Round Robin | $O(1)$ | $O(1)$ | $O(n)$ |
| Weighted Round Robin | $O(1)$ | $O(n)$ rebuild | $O(n)$ |
| Least Connections | $O(n)$ or $O(\log n)$ with heap | $O(\log n)$ | $O(n)$ |
| Consistent Hashing | $O(\log(n \cdot v))$ | $O(v \cdot \log(n \cdot v))$ | $O(n \cdot v)$ |
| IP Hash | $O(1)$ | $O(1)$ but remaps all keys | $O(n)$ |
When a server is added or removed with simple modular hashing ($\text{server} = \text{hash}(\text{key}) \mod n$), the fraction of keys that must be remapped is:
With consistent hashing, only the keys in the affected arc move:
This is the key advantage of consistent hashing for stateful services like caches. Adding a server to a 10-node cluster remaps ~10% of keys instead of ~90%.
Implementation
ALGORITHM RoundRobin(servers)
INPUT: servers: list of healthy backend servers
STATE: counter: integer, initialized to 0
OUTPUT: selected server
BEGIN
selected <- servers[counter MOD LENGTH(servers)]
counter <- counter + 1
RETURN selected
END
ALGORITHM WeightedRoundRobin(servers, weights)
INPUT: servers: list of servers, weights: list of integer weights
STATE: currentWeights: list of integers, initialized to copy of weights
OUTPUT: selected server
BEGIN
totalWeight <- SUM(weights)
bestIndex <- -1
bestWeight <- -INFINITY
FOR i <- 0 TO LENGTH(servers) - 1 DO
currentWeights[i] <- currentWeights[i] + weights[i]
IF currentWeights[i] > bestWeight THEN
bestWeight <- currentWeights[i]
bestIndex <- i
END IF
END FOR
currentWeights[bestIndex] <- currentWeights[bestIndex] - totalWeight
RETURN servers[bestIndex]
END
ALGORITHM LeastConnections(servers, activeConnections)
INPUT: servers: list of servers, activeConnections: map of server to connection count
OUTPUT: selected server
BEGIN
minConnections <- INFINITY
selected <- NIL
FOR EACH server IN servers DO
IF activeConnections[server] < minConnections THEN
minConnections <- activeConnections[server]
selected <- server
END IF
END FOR
activeConnections[selected] <- activeConnections[selected] + 1
RETURN selected
END
ALGORITHM ConsistentHashLookup(ring, key)
INPUT: ring: sorted list of (hashValue, serverID) pairs, key: request key
OUTPUT: selected server
BEGIN
keyHash <- HASH(key) MOD 2^32
// Binary search for the first ring position >= keyHash
low <- 0
high <- LENGTH(ring) - 1
result <- 0
WHILE low <= high DO
mid <- (low + high) / 2
IF ring[mid].hashValue >= keyHash THEN
result <- mid
high <- mid - 1
ELSE
low <- mid + 1
END IF
END WHILE
// If keyHash is larger than all ring positions, wrap to first
IF low > LENGTH(ring) - 1 THEN
result <- 0
END IF
RETURN ring[result].serverID
END
ALGORITHM AddServerToRing(ring, serverID, virtualNodes)
INPUT: ring: sorted list of (hashValue, serverID), serverID: new server, virtualNodes: count
BEGIN
FOR i <- 0 TO virtualNodes - 1 DO
virtualKey <- serverID + ":" + TO_STRING(i)
hashValue <- HASH(virtualKey) MOD 2^32
INSERT (hashValue, serverID) INTO ring maintaining sorted order
END FOR
END
ALGORITHM HealthCheck(servers, healthySet, interval)
INPUT: servers: all known servers, healthySet: set of currently healthy servers, interval: check period
BEGIN
LOOP EVERY interval seconds DO
FOR EACH server IN servers DO
response <- SEND_PROBE(server)
IF response is successful THEN
ADD server TO healthySet
ELSE
REMOVE server FROM healthySet
END IF
END FOR
END LOOP
END
Real-World Applications
- Web applications: nearly every production web service places a load balancer (such as NGINX, HAProxy, or a cloud-managed balancer) in front of its application servers to distribute HTTP traffic and provide failover
- Distributed caches: systems like Memcached and Redis clusters use consistent hashing to route cache keys to specific nodes, ensuring that adding or removing a node only invalidates a small fraction of cached data
- Microservices: service meshes (Envoy, Linkerd) embed L7 load balancing into every service-to-service call, using least connections or weighted strategies to handle heterogeneous instance sizes
- DNS-based balancing: DNS servers return different IP addresses for the same domain name in round-robin order, providing a coarse form of global load distribution across data centers
- Database read replicas: applications route read queries across multiple database replicas using round robin or least connections, while directing all writes to the primary node
- CDN edge routing: content delivery networks use geographic and latency-based load balancing to route users to the nearest edge server, combining L4 and L7 techniques
Key Takeaways
- Load balancers distribute traffic across servers to increase capacity, improve availability, and enable zero-downtime maintenance
- Round robin is simple and effective when servers are identical and requests are uniform; weighted round robin handles heterogeneous server capacities
- Least connections adapts naturally to varying request costs and server speeds by always choosing the least-loaded server
- Consistent hashing minimizes key remapping when servers join or leave ($\approx 1/n$ keys affected vs. $\approx (n-1)/n$ with modular hashing), making it ideal for caches and stateful services
- L4 balancers are fast but blind to application content; L7 balancers enable content-based routing, TLS termination, and request manipulation at the cost of higher per-request overhead