Reliability
How consistently a system delivers correct results over time, including its ability to handle failures and meet targets.
Definition
Reliability is how consistently the system does the right thing over time.
In practice, you make reliability concrete with SLIs and SLOs.
Why it matters
Reliable systems fail in smaller, more predictable ways.
Unreliable systems fail as surprises.
How to make it concrete
- Pick an SLI that matches user experience.
- Set an SLO that matches the business reality.
- Use the error budget to decide when to ship faster vs slow down.