Service level objectives (SLOs)
A service level objective (SLO) is a reliability target for a system: “this system should be available at least 99.9% of the time over the last 30 days.” Checkstack tracks each objective continuously, tells you how much of your error budget is left, and warns you before you breach. SLOs turn the raw healthy/unhealthy readings from your health checks into a single number you can hold a service to.
The basics
Section titled “The basics”An objective is defined by:
- A system it measures.
- A target percentage (for example
99.9). - A window in days (a rolling window, for example the last 30 days). The window always ends “now”, so the objective is always evaluated against the most recent N days.
- Optionally, a single health check to measure. Leave it unset to measure the system’s overall health across all its checks.
Availability is computed from health state over the window: a system counts as “good” while it is healthy, and as “down” while it is degraded or unhealthy. Each outage is recorded as a downtime event that opens when the system stops being healthy and closes when it recovers.
Error budget
Section titled “Error budget”The flip side of a target is the downtime it allows. A 99.9% target over 30 days permits roughly 43 minutes of downtime; that allowance is your error budget.
- Budget remaining is how much of that allowance is still unspent. The UI shows it as a bar that runs green, then amber, then red as it depletes.
- Burn rate is how fast you are spending the budget relative to the window. A burn rate above 1 means you are consuming budget faster than the window can sustain. You set warning and critical burn-rate thresholds per objective (defaults: 50% and 80%).
Status
Section titled “Status”At any moment an objective is in one of these states, which surface as a signal on the dashboard:
| Status | Meaning |
|---|---|
| Healthy | On track. Availability is above target and budget consumption is nominal. |
| At risk | Healthy now, but the remaining error budget is low (20% or less). Approaching a breach. |
| Degraded | The system is currently down, and that downtime is counting against this objective. |
| Breaching | Measured availability has fallen below the target. |
Dependency-aware attribution
Section titled “Dependency-aware attribution”A system is often down only because something it depends on is down. Counting that against the system’s own SLO punishes it for a failure it did not cause. Checkstack’s SLOs are dependency-aware: each objective chooses how to attribute upstream-caused downtime.
- Strict (default): count all downtime, whoever caused it. Use this for a user-facing promise where the cause does not matter.
- Self-only: exclude downtime caused by an unhealthy upstream dependency. The outage is still recorded and attributed, but it does not consume this system’s budget.
You can also exclude specific upstream systems explicitly. The objective’s detail page shows the attribution breakdown, so you can see exactly which minutes were charged to the system itself versus an upstream.
Notifications and history
Section titled “Notifications and history”- A breaching or recovering objective broadcasts a signal that surfaces on the dashboard and feeds the assistant’s “what is wrong?” view.
- A periodic digest summarises objectives across all systems (how many are breaching, at risk, and healthy, plus the best and worst performers) through your configured notification channels.
- Each objective’s detail page keeps a trend chart from daily snapshots, a downtime timeline, the attribution breakdown, and streaks, so you can see reliability over time, not just right now.
Managing SLOs
Section titled “Managing SLOs”| Where to go | What you do there |
|---|---|
| Reliability -> SLO overview | See every objective at a glance with its error-budget bar and burn rate. |
| Reliability -> Manage SLOs | Create, edit, and delete objectives. Requires the SLO manage permission. |
| An objective’s detail page | Drill into one objective: trend, downtime timeline, attribution, streaks. |
SLOs can also be declared in Git as a SLO entity kind, so you can manage them alongside the rest of your platform configuration. See the GitOps entity kinds reference.
Where to go next
Section titled “Where to go next”- Availability source. Read Health checks to understand what feeds an objective.
- Dependencies. See Systems and groups for the dependency model that powers self-only attribution.
- React to breaches. Use Automations to act when an objective starts burning budget.