Skip to content

Maintenances

A Maintenance is a planned-downtime window attached to one or more systems. Unlike an Incident, which records something that broke, a maintenance records something you are going to do (or are doing right now). It also doubles as a notification-suppression mechanism so on-call channels do not light up during expected disruption.

Each maintenance carries:

  • A title and an optional description of what work is planned.
  • A start time and an end time that define the window.
  • A status that progresses automatically as the window opens and closes.
  • A list of affected systems.
  • A timeline of status updates for human-readable progress notes.
  • Optional hotlinks (change ticket, runbook, chat thread).
  • A suppress notifications toggle.

Maintenances live under Maintenances in the main nav.

A maintenance moves through four statuses, driven by the configured start and end times:

+-----------+ start_at reached +-------------+ end_at reached +-----------+
| scheduled |--------------------->| in_progress |------------------->| completed |
+-----------+ +-------------+ +-----------+
| |
| v
+-------------+ cancelled +-----------+
+------------------>| cancelled |
+-----------+
  • scheduled: the window is in the future. The maintenance is announced but not yet “live”.
  • in_progress: start_at has passed and end_at has not. The maintenance is the current state of affairs.
  • completed: end_at has passed without cancellation. The work is over.
  • cancelled: an operator cancelled the maintenance before or during the window.

Status transitions are automated. A background job evaluates every minute and flips the status the moment the corresponding boundary is crossed. You can also force transitions manually (for example, completing early when work finishes ahead of schedule).

Like incidents, maintenances have a suppress notifications toggle. The behaviour is intentionally similar, with one important difference: the toggle is only active while the maintenance is in_progress. A scheduled maintenance does not pre-arm suppression; the silencing kicks in only when the window opens, and lifts the moment it closes.

When in_progress and suppressNotifications = true:

  • Health-state-change notifications for the maintenance’s affected systems are suppressed.
  • Dependency cascade notifications from those systems are suppressed.

Notifications about the maintenance itself (started, ending soon, completed) are still delivered. Operators want to know that a window opened and closed even if everything else is silenced.

Maintenances do not pause the underlying health checks. The probes still run; the results still flow into history; the platform’s evaluator still sees the failures. What changes is the notification fan-out: the platform consults active maintenances before firing health-state-change notifications.

If you want to actually stop a check from running during a maintenance (because the work involves the check target being unreachable in a way that would pollute history), pause the check assignment directly from the system detail page. That is a separate setting from maintenances and is not driven by maintenance status.

Use a maintenance when:

  • The disruption is planned.
  • You know the start and end times.
  • You want to suppress noise for the duration.

Use an incident when:

  • Something broke unexpectedly.
  • You do not know how long it will last.
  • You need to track investigation and remediation steps.

The two can coexist. If a planned maintenance turns into an actual outage that exceeds the window, open an incident for the unexpected portion; the maintenance still records the original planned work.

Attach the catalog systems the maintenance affects. This drives:

  • Discoverability from the system detail page (anyone looking at a system sees upcoming and active maintenances).
  • The scope of notification suppression while the maintenance is in_progress.

Standard hotlink slots, identical to incidents: free-form URL labels for change tickets, runbooks, chat threads, or anything else worth attaching.

The lifecycle events (created, status changed, completed, cancelled) flow through the integration system the same way incident events do. You can mirror them to a Slack channel, a status page, or any HTTP webhook. See Integrations.

Where to goWhat you do there
Maintenances (list)See scheduled, in-progress, and past maintenances. Filter by status.
Schedule MaintenanceCreate one. Pick the time window, attach systems, decide on suppression.
Maintenance detailEdit, cancel, post updates, manage hotlinks.
System detailSee the upcoming and active maintenances for this system.