T72 Dec 24, 2025 3 min read

Error budget

The amount of unreliability allowed by an SLO. It is the gap between 100% and the SLO target over a time window.

Definition

An error budget is the amount of unreliability your SLO allows.

Example: a 99.9% SLO means you can be down or failing 0.1% of the time in the window.

Concrete numbers (so it is not abstract)

If the window is 30 days:

  • Total time is 30 × 24 × 60 = 43,200 minutes.
  • 0.1% of that is 43.2 minutes.

So a 99.9% SLO roughly means you can “spend” about 43 minutes per month on user-visible failure.

The exact math depends on how you define the SLI (errors, latency, both), but the intuition holds.

Why it matters

Error budgets turn reliability into a tradeoff you can reason about.

If you are within budget, you can ship faster.

If you are out of budget, you should slow down and fix reliability.

How teams use it in real life

  • If the budget is healthy, teams take more release risk (ship features).
  • If the budget is burned, teams reduce risk (stabilize, fix incidents, pay down reliability debt).