Understanding Load Balancers
Load balancers distribute incoming traffic across multiple servers to ensure reliability, availability, and optimal resource utilization. Understanding load balancer capacity is crucial for proper scaling and cost management.
Load Balancer Types
Application Load Balancer (Layer 7)
Operates at the HTTP/HTTPS level with advanced routing:
- Content-based routing: Route by URL, headers, query strings
- Host-based routing: Route by hostname
- WebSocket support: Persistent connections
- HTTP/2 support: Multiplexed connections
- Best for: Web applications, microservices, containers
Network Load Balancer (Layer 4)
Operates at the TCP/UDP level with ultra-low latency:
- High performance: Millions of requests per second
- Static IP: Fixed IP addresses per AZ
- Low latency: Microsecond latencies
- Protocol support: TCP, UDP, TLS
- Best for: Gaming, IoT, high-performance applications
Classic Load Balancer (Legacy)
Previous generation, supports both Layer 4 and 7:
- EC2-Classic: Supports legacy EC2 platform
- Limited features: Basic load balancing
- Recommendation: Migrate to ALB or NLB
AWS ALB Pricing - Load Balancer Capacity Units (LCU)
What is an LCU?
An LCU measures load balancer resource utilization across four dimensions:
1. New Connections (per second)
- 1 LCU = 25 new connections/second
- Example: 100 new connections/sec = 4 LCUs
2. Active Connections (per minute)
- 1 LCU = 3,000 active connections
- Example: 9,000 concurrent connections = 3 LCUs
3. Processed Bytes
- 1 LCU = 1 GB per hour (for EC2, IP targets)
- 1 LCU = 0.4 GB per hour (for Lambda targets)
- Example: 2 GB/hour = 2 LCUs
4. Rule Evaluations
- 1 LCU = 1,000 rule evaluations/second
- First 10 rules: Free
- Example: 2,000 eval/sec = 2 LCUs
LCU Billing
You're charged for the highest dimension:
- If new connections = 4 LCUs, active = 3 LCUs, bandwidth = 2 LCUs
- You pay for 4 LCUs
Connection Metrics
Concurrent Connections
The number of connections active at any given moment. Calculated as:
Concurrent = Requests/sec × Response_time
Example: 100 req/s × 0.2s response = 20 concurrent connections
Connection Duration
How long connections stay open:
- HTTP/1.1: Typically 5-30 seconds (keep-alive)
- HTTP/2: 60+ seconds (multiplexed)
- WebSocket: Minutes to hours (persistent)
Capacity Planning
Estimating Backend Servers
General guidelines for server capacity:
- CPU-bound: 100-500 req/sec per core
- I/O-bound: 1,000-10,000 concurrent connections per server
- Memory-bound: Depends on data size and caching
High Availability
For production workloads:
- Deploy in multiple availability zones (minimum 2)
- Size for N+1 capacity (survive one server failure)
- Plan for 2-3x peak traffic
- Enable connection draining (300-3600 seconds)
Auto-Scaling Rules
- CPU threshold: Scale at 70% average CPU
- Active connections: Scale at 1,000 per instance
- Response time: Scale when latency > target
- Request count: Scale at target requests/instance
Performance Optimization
Connection Pooling
Reuse backend connections to reduce overhead:
- Reduces TCP handshake latency
- Saves on connection establishment time
- Improves throughput
Keep-Alive Settings
Optimize connection reuse:
- Client keep-alive: 60-120 seconds
- Backend keep-alive: 60-300 seconds
- Idle timeout: Balance between reuse and resource consumption
Health Checks
- Interval: 10-30 seconds (more frequent = faster failover)
- Timeout: 5-10 seconds
- Threshold: 2-3 consecutive failures
- Endpoint: Lightweight endpoint (e.g., /health)