Understanding Latency Percentiles
Latency percentiles provide a more accurate picture of user experience than simple averages. They help you understand how different segments of your users experience your system's performance.
Why Percentiles Matter
Average (mean) latency can be misleading because:
- Outliers heavily influence averages
- A few slow requests can hide widespread performance issues
- Users care about their individual experience, not the average
- Business impact often comes from tail latency (slowest requests)
Common Percentiles Explained
P50 (Median)
50% of requests are faster, 50% are slower. This is the "typical" user experience.
- Good baseline metric for general performance
- Not affected by outliers
- Represents the experience of the median user
P90 (90th Percentile)
90% of requests are faster. Represents typical user experience including some variation.
- Common target for SLOs
- Balances typical and worst-case performance
- Useful for capacity planning
P95 (95th Percentile)
95% of requests are faster. Most common SLO target.
- Industry standard for performance SLOs
- Captures most users while excluding extreme outliers
- Typical target: <200ms for web applications
P99 (99th Percentile)
99% of requests are faster. Represents "tail latency" - worst-case scenarios.
- Important for premium users or critical operations
- Often 2-10x slower than P50
- Can indicate system degradation or resource contention
P99.9 (99.9th Percentile)
99.9% of requests are faster. Extreme tail latency.
- Important for high-traffic systems
- Can reveal rare but severe performance issues
- Critical for systems where 0.1% = thousands of users
Setting Latency Targets
Web Applications
- P50 < 100ms: Excellent
- P95 < 200ms: Good
- P99 < 500ms: Acceptable
API Endpoints
- P50 < 50ms: Fast
- P95 < 100ms: Good
- P99 < 250ms: Acceptable
Microservices (Internal)
- P50 < 10ms: Excellent
- P95 < 50ms: Good
- P99 < 100ms: Acceptable
Interpreting Results
Low Variance (P99 / P50 < 2)
Consistent performance across all requests. Indicates:
- Stable system with predictable behavior
- Good resource allocation
- Minimal contention or queuing
Moderate Variance (P99 / P50 = 2-5)
Typical for most systems. May indicate:
- Normal variation in request complexity
- Acceptable garbage collection pauses
- Occasional resource contention
High Variance (P99 / P50 > 5)
Significant tail latency issues. Investigate:
- Resource exhaustion (CPU, memory, disk I/O)
- Database query performance
- Network issues or timeouts
- Inefficient algorithms or hot spots
- External dependency problems
Best Practices
- Monitor percentiles, not just averages
- Set SLOs on P95 or P99, not mean
- Track percentiles over time to spot trends
- Alert on percentile violations, not spikes in max
- Investigate when P99/P50 ratio is high
- Use percentiles for capacity planning
- Consider different percentiles for different operations