A load balancer is a piece of infrastructure that spreads incoming traffic across several servers so no single one gets overwhelmed. It sits in front of your servers, receives every request, and decides which machine should handle it. The payoff is twofold: your service stays responsive under heavy traffic, and it keeps working even when an individual server fails. It is one of the core building blocks of any system that needs to scale beyond one machine.
How a load balancer works
When users hit your site, they actually hit the load balancer first. It holds a list of backend servers and forwards each request to one of them according to a chosen rule. To the user it looks like a single fast service; behind the scenes the work is shared across many machines.
The balancer also runs health checks, pinging each server on a schedule. If a server stops responding correctly, the balancer stops sending it traffic until it recovers. That is how a service can lose a server and keep serving requests with no visible downtime. The servers behind it are frequently packaged as containers, which makes adding or replacing one in the pool quick.
Layer 4 versus Layer 7
The two main types differ in how much of the request they understand.
| Type |
Works at |
Decides based on |
Trade-off |
| Layer 4 |
Transport (TCP/UDP) |
IP address and port |
Very fast, less aware |
| Layer 7 |
Application (HTTP) |
URL, headers, cookies |
Smarter routing, more work |
A Layer 4 balancer just forwards packets quickly without reading them. A Layer 7 balancer understands HTTP, so it can route /api to one pool of servers and /images to another, or keep a user pinned to the same server using a cookie. Most web applications use Layer 7 for its flexibility.
Common balancing methods
- Round robin. Hand requests to each server in turn. Simple and even when servers are similar.
- Least connections. Send the next request to the server currently handling the fewest. Good when request length varies.
- Weighted. Give beefier servers a larger share of traffic.
- IP hash or sticky sessions. Keep a given user on the same server, useful when a server holds session state in memory.
Why it matters
- Reliability. A single server is a single point of failure. With a balancer and several servers, one can die and users never notice.
- Scalability. When traffic grows, you add servers behind the balancer instead of buying one giant machine.
- Maintenance without downtime. You can drain traffic from one server, update it, and return it to the pool while the rest keep serving.
- Smoother performance. Spreading load avoids the slowdowns that hit when one machine is doing everything.
What to skip
- Building your own load balancer. Managed cloud balancers and mature proxies like Nginx, HAProxy, Envoy, and Caddy handle the hard cases; rolling your own is rarely worth it.
- Sticky sessions when you can avoid them. Storing session state in a shared store instead of server memory lets any server handle any request, which is simpler and more resilient.
- A balancer before you need one. A small app on a single server with a managed platform does not need a separate balancer yet; add it when traffic or reliability demands it.
Common misconceptions
- "A load balancer and a reverse proxy are the same thing." They overlap. A reverse proxy forwards requests to backends; a load balancer adds the job of distributing across many. Many tools, like Nginx, do both.
- "It makes a single request faster." It does not speed up one request; it keeps the whole system responsive by spreading many requests.
- "More servers always means more speed." Only if the balancer and the backends are configured well. A database bottleneck behind the servers will still limit you.
FAQ
What is the difference between a load balancer and a reverse proxy?
A reverse proxy sits in front of servers and forwards requests to them. A load balancer is a reverse proxy whose specific job is distributing traffic across multiple servers. Many products do both roles.
Do I need a load balancer for a small website?
Usually not at first. A single server, often on a managed platform, is fine. You add a load balancer when you need to handle more traffic or survive a server failure.
What is a health check?
A periodic test the balancer runs against each server. If a server fails the check, the balancer stops routing traffic to it until it responds correctly again.
What is the difference between Layer 4 and Layer 7?
Layer 4 routes based on network information like IP and port and is very fast. Layer 7 understands the application request, so it can route by URL, headers, or cookies at the cost of a little more processing.
Where to go next
See what a CDN is and how it speeds up sites, what a REST API is, and Docker versus Kubernetes explained.