What Is a Load Balancer? Distributing Traffic for Performance

Why Load Balancers Are Essential

A single server has finite capacity — a limited number of CPU cores, a fixed amount of RAM, and a network interface with a maximum throughput. As traffic grows, a single server becomes a bottleneck, then a single point of failure. Load balancers solve both problems simultaneously.

A load balancer sits in front of a pool of backend servers and distributes incoming requests across them. Clients connect to a single virtual IP address (the load balancer's VIP); the load balancer forwards each connection to one of the backend servers based on a distribution algorithm. From the client's perspective, they are talking to a single service — the distribution is invisible.

This architecture enables two critical properties:

Horizontal scaling: Add more servers to the pool to handle more traffic. Unlike vertical scaling (getting a bigger server), horizontal scaling has no practical upper limit. Modern cloud deployments auto-scale — automatically adding servers during traffic spikes and removing them when load drops.
High availability: If one server crashes, the load balancer detects the failure via health checks and stops sending traffic to it. Remaining servers absorb the load. No single server failure takes down the service.

Every major website and API you use runs behind load balancers. Use our IP lookup tool on any major website's domain — you will often see CDN or load balancer IPs rather than a single application server.

Layer 4 vs Layer 7 Load Balancing

Load balancers operate at different layers of the network stack, with fundamentally different capabilities:

Layer 4 (Transport) Load Balancers operate at the TCP/UDP level. They route connections based on IP addresses and port numbers without inspecting the content of packets. Key characteristics:

Extremely fast — minimal packet processing overhead
Transparent to the application — the TCP connection is passed through or proxied at the connection level
Cannot make routing decisions based on URLs, headers, or cookies
Examples: AWS Network Load Balancer (NLB), HAProxy in TCP mode, hardware appliances like F5

Layer 7 (Application) Load Balancers terminate and re-establish connections, inspecting the full HTTP request. This enables content-aware routing:

Route /api/ paths to API servers and /static/ paths to CDN storage
Route based on Host header (virtual hosting — one IP serves many domains)
Sticky sessions via cookie injection
SSL/TLS termination
Request/response modification, header injection
WAF and rate limiting integration
Examples: AWS Application Load Balancer (ALB), Nginx, HAProxy in HTTP mode, Envoy

In practice, most production architectures use Layer 7 load balancers because the content-aware routing and application-level features are essential. Layer 4 load balancers are used for non-HTTP protocols (databases, message queues) and when absolute minimum latency is required.

Load Balancing Algorithms

The choice of algorithm affects how evenly load is distributed and whether session affinity is maintained:

Round Robin: Requests are distributed sequentially across servers — server 1, server 2, server 3, server 1, ... Simple and fair when all requests have similar cost. Poor when requests vary significantly in processing time — one server may pile up with expensive requests while others are idle.
Weighted Round Robin: Servers are assigned weights proportional to their capacity. A server with twice the CPU gets twice the requests. Allows heterogeneous server fleets to share load proportionally.
Least Connections: New requests go to the server with the fewest active connections. Better than round robin when requests have varying durations. Naturally sends more traffic to faster servers. Used by Nginx's least_conn directive.
Least Response Time: Combines connection count with response time measurement — sends requests to the server that is both least loaded and fastest responding. Requires active latency measurement.
IP Hash: The client's IP address is hashed to deterministically select a server. The same client always reaches the same server — providing sticky sessions without cookies. But server pool changes (adding/removing servers) change hash results, reassigning clients.
Random with Two Choices (Power of Two): Pick two servers at random, send to whichever is less loaded. Achieves near-optimal distribution with low coordination overhead — highly effective for large server pools.

Health Checks and Session Persistence

Two operational features are critical for production load balancers:

Health checks continuously monitor backend servers. Active health checks send periodic probe requests (HTTP GET to /health, TCP connect, or UDP packet) and mark servers as unhealthy if they fail or respond too slowly. Unhealthy servers are removed from the rotation until they recover and pass health checks again. Passive health checks monitor actual request failures — if a server returns 5xx errors or connection timeouts, it is marked unhealthy based on real traffic.

A good health check endpoint should verify that the application is actually functional — checking database connectivity, dependency health, and memory usage — not just that the HTTP server is accepting connections. A server that can accept HTTP but cannot reach its database is unhealthy from the application's perspective.

Session persistence (sticky sessions) is required for applications that store session state on the server rather than in a shared store. If a user's session is stored in memory on server 1, they must always be routed to server 1. The load balancer achieves this via:

Cookie-based: The load balancer injects a cookie identifying which server to route to. Reliable and transparent to the application.
IP hash: Same IP always routes to the same server. Unreliable when clients share IPs (corporate NAT) or change IPs (mobile users).

Best practice: design applications to store session state in a shared external store (Redis, database) rather than in-memory, eliminating the need for sticky sessions and enabling true stateless load balancing.

Global Load Balancing and Anycast

For globally distributed applications, load balancing extends beyond a single data center through two mechanisms:

Global Server Load Balancing (GSLB): DNS-based geographic routing. The DNS server returns different IP addresses based on where the client is located. A user in Europe resolves api.example.com to a European data center IP; a US user gets a US data center IP. DNS TTLs are kept short (30-60 seconds) to allow rapid failover if a data center becomes unavailable.

Anycast: The same IP address is announced from multiple geographic locations via BGP. Routers automatically forward packets to the nearest announcement. Cloudflare uses anycast for its entire CDN and DDoS mitigation network — the IP 1.1.1.1 is served from hundreds of data centers worldwide, and your DNS query automatically reaches the closest one.

Anycast provides both performance (geographic proximity) and resilience (if one site fails, BGP routes around it). It is the infrastructure behind Cloudflare, Google's public DNS (8.8.8.8), and root DNS servers.

Use our ping test to measure latency to specific IPs and see the geographic benefit of anycast routing in action — pinging 1.1.1.1 from different locations typically yields sub-5ms responses globally.

Frequently Asked Questions

What is the difference between a load balancer and a reverse proxy?

Every load balancer is a reverse proxy, but not every reverse proxy is a load balancer. A reverse proxy can serve a single backend (for TLS termination, caching, or WAF). A load balancer specifically distributes traffic across multiple backends. Nginx serves as both — a reverse proxy for a single origin becomes a load balancer when you configure an upstream block with multiple servers. See our reverse proxy guide for more.

How does a load balancer handle SSL/TLS?

Layer 7 load balancers terminate TLS — they decrypt the incoming HTTPS connection, inspect the HTTP content to make routing decisions, then re-encrypt when forwarding to backends (or use plain HTTP on a trusted internal network). TLS passthrough is an alternative at Layer 4 where the encrypted connection is forwarded without decryption — the backend server handles TLS but the load balancer cannot inspect content for routing.

What is a virtual IP address (VIP)?

A VIP is the IP address that clients connect to — it belongs to the load balancer rather than to any individual server. The load balancer owns the VIP and distributes connections to the backend pool. When you do an IP lookup on a high-traffic website, the resulting IP is typically a VIP — multiple physical or virtual machines answer on that IP in a high-availability pair.

Can I see if a website uses a load balancer?

Sometimes. Check the response headers with our HTTP headers tool — headers like X-Served-By, X-Backend-Server, or varying Set-Cookie headers across requests can indicate load balancer infrastructure. Running multiple lookups on the same domain may also reveal different IPs if the DNS tier distributes across data centers.

📖 What Is a Reverse Proxy? Load Balancing and Security Explained 📖 What Is a CDN? How Content Delivery Networks Speed Up the Web 📖 BGP Routing Explained: How the Internet Finds Its Way