SLI
Service Level Indicator. A metric that measures a user-facing aspect of service health, like success rate or latency.
Definition
An SLI (Service Level Indicator) is the metric you use to measure user-visible service health.
Think of it as the question: “From the user’s point of view, is this working.”
Examples (real-world shapes):
- Successful request rate (2xx, or “no errors”) for a specific endpoint.
- p99 latency for a critical journey like login or checkout.
- Freshness for a feed, like “99% of users see updates within 60 seconds.”
How to pick a good SLI
- Pick something user-visible.
- Measure it at the boundary where users interact with the system (usually the API or UI edge).
- Keep it simple enough that you trust it during incidents.
Why it matters
SLIs stop reliability conversations from being abstract.
If you cannot name an SLI, you cannot measure what “good” means for users.