Skip to content

Set up your first health check

This walkthrough takes you from a fresh Checkstack install to a running HTTP health check returning live data. The HTTP strategy is the easiest to demo because it only needs a URL, but every other strategy follows the same flow.

Sign in to Checkstack as a user with the catalog.systems.manage and healthcheck.configuration.manage access rules. Both rules are included in the built-in administrator role.

The first thing you see is the dashboard. Use the main sidebar to navigate; the guide below references its menu entries by name.

A system is the unit of organisation in Checkstack: it groups one or more health checks, tracks an overall health status, and is the thing notifications and incidents reference. Read Systems and groups for the full mental model.

  1. Open the Catalog page from the sidebar.
  2. Click Create System.
  3. Fill in the form:
    • Name - for example Payments API.
    • Description - optional, one or two sentences describing what the system does.
  4. Click Create.

The new system appears in the catalog with status unknown. It stays in unknown until the first health check returns a result.

  1. Open the Health Checks page from the sidebar.
  2. Click Create Check in the top-right corner.

You land on the Strategy Picker, a grid of every strategy installed on this instance grouped by category (Network, HTTP, Database, Script, and so on). Hover any card to see its description; use the search box if the list is long.

Click the HTTP Health Check card. The platform navigates to the Health Check editor with the HTTP strategy preselected.

The editor is a split-pane IDE. The left tree lists the editable sections (General, Strategy, Collectors). Walk through each one:

  • Name - a friendly label, for example Payments API root.
  • Interval (seconds) - how often the check runs. 60 is a sensible starting value; the platform enforces a minimum.

The HTTP strategy itself only requires global request defaults; the actual URL lives on a collector. Leave the defaults unless you have specific timeout requirements.

The HTTP strategy ships with a built-in Request collector. Add it from the Add Collector menu, then configure:

  • URL - the endpoint to call, for example https://api.example.com/healthz.
  • Method - GET for most health checks.
  • Expected status - 200 (or a list, for example 200, 204).
  • Timeout - the HTTP timeout. Defaults work for most cases.

Below the configuration tree, the editor shows an Assignments section listing the systems this check applies to.

  1. Click Add assignment.
  2. Select the Payments API system you created in step 2.
  3. Optional: override the default state thresholds if this system needs a faster or slower failover. Defaults are 1 failure to mark degraded, 3 failures to mark unhealthy.

Click Save in the top-right corner. The editor validates the config, persists it, and schedules the first run. You return to the Health Checks list with a toast confirming success.

The first execution kicks off within a few seconds:

  1. Open the Catalog and click into the Payments API system.
  2. The system detail page shows the new health check with status running briefly, then healthy or unhealthy once the first run completes.
  3. Click the check name to see the run history, a latency chart, and per-collector charts.

If the result is unhealthy, the detail panel surfaces the error message returned by the collector (HTTP status, timeout, connection refused, and so on).

From here you can:

  • Add more collectors to the same check to assert response body content, certificate expiry, or custom headers.
  • Add more checks to the system - for example a Postgres connectivity check from the same plugin family.
  • Schedule the check to a remote vantage point by attaching a satellite. See Connect a satellite.
  • Suppress notifications for known disruptions. See Silence alerts.

Failing health checks do NOT auto-open incidents. They flip the system status, burn SLO error budget, and notify subscribers, but the incident timeline is reported by hand. See Open and resolve an incident.