Automation primitives reference
This page is the canonical per-primitive reference for an automation definition. Every action, trigger, and condition is listed with its shape and a runnable YAML snippet. The visual editor and the YAML view round-trip losslessly, so anything here is also editable in the builder. For the runtime semantics behind these shapes, see the automation platform overview and the sensing layer.
Definition shape
Section titled “Definition shape”An automation definition is triggers + optional conditions + ordered actions, plus run-control fields. Actions are discriminated by which key is present (action, choose, parallel, delay, repeat, variables, condition, stop, wait_for_trigger, wait_until, sequence).
name: "Page on sustained latency"mode: single # single | parallel | queued | restartconcurrency_scope: context # automation | contextmax_runs: 10 # queue depth for mode: queuedtriggers: - event: automation.numeric_state config: { field: p95LatencyMs, above: 500 } for: { minutes: 10 }conditions: - time: after: "08:00" before: "20:00"actions: - action: notification.send config: { title: "p95 latency high", body: "{{ trigger.payload.systemId }}" }Every action also accepts these shared fields:
id- a stable identifier, used to reference the action’s artifacts (artifacts.<id>.<name>) and to read it in run logs. Auto-filled if left blank.description- an optional operator note.enabled- set tofalseto skip the action without deleting it. Defaults totrue.continue_on_error- whentrue, a failure in this action does not halt the run. Defaults tofalse.
Control-flow actions
Section titled “Control-flow actions”action
Section titled “action”Calls a registered action by its namespaced id (plugin.action_name) and renders its templated config.
actions: - action: incident.create config: title: "{{ trigger.payload.systemName }} is down" severity: critical systemIds: ["{{ trigger.payload.systemId }}"]choose
Section titled “choose”if / elif / else branching. Each clause has a when condition and a sequence that runs when it is the first matching clause. The optional else runs when no clause matches.
actions: - choose: - when: "trigger.payload.severity == 'critical'" sequence: - action: notification.send config: { title: "Critical", body: "Paging on-call" } - when: "trigger.payload.severity == 'warning'" sequence: - action: notification.send config: { title: "Warning", body: "FYI" } else: - action: notification.send config: { title: "Info", body: "Logged only" }parallel
Section titled “parallel”Fans out actions concurrently and waits for all of them. Each branch is itself an action - wrap multi-step branches in a sequence.
actions: - parallel: - action: notification.send config: { title: "Notify ops" } - sequence: - action: incident.create config: { title: "Investigate", severity: warning } - action: notification.send config: { title: "Ticket opened" }sequence
Section titled “sequence”Wraps an ordered list of actions as a single action. Useful as a multi-action branch inside parallel / choose, or to apply one id / continue_on_error to a group atomically.
actions: - sequence: id: triage continue_on_error: true sequence: - action: incident.create config: { title: "Triage", severity: warning } - action: notification.send config: { title: "Triage started" }Sleeps for a fixed or templated number of seconds (max 86400). The run suspends durably and resumes when the delay elapses.
actions: - delay: { seconds: 300 } # five minutes - delay: { template: "{{ trigger.payload.cooldownSeconds }}" }repeat
Section titled “repeat”Loops a sequence in one of four modes. repeat.index is exposed in every mode; for_each also exposes repeat.item.
count- run the sequence a fixed number of times.for_each- a template rendering to a JSON array; run once per item.while- evaluate a condition before each iteration; stop when false.until- evaluate a condition after each iteration; stop when true.
while / until accept an optional max_iterations (defaults to 1000) as a safety net.
actions: - repeat: count: 3 sequence: - action: notification.send config: { title: "Reminder {{ repeat.index }}" } - repeat: for_each: "{{ trigger.payload.affectedSystems }}" sequence: - action: incident.create config: { title: "{{ repeat.item }} affected", severity: warning } - repeat: while: "health.system.status != 'healthy'" max_iterations: 20 sequence: - wait_until: condition: "health.system.status == 'healthy'"variables
Section titled “variables”Defines local scoped values for downstream actions. Values can be literals or templates; templates render at execution time and the rendered value is stored under the variable name.
actions: - variables: threshold: 500 summary: "{{ trigger.payload.systemName }} at {{ trigger.payload.p95 }}ms" - action: notification.send config: { title: "{{ var.summary }}" }condition
Section titled “condition”A mid-run guard. If the condition is false the run halts (unless continue_on_error: true). Accepts any condition shape - a template string or a structured variant.
actions: - condition: "health.system.status == 'unhealthy'" - action: incident.create config: { title: "Confirmed unhealthy", severity: critical }Explicitly halts the run, with an optional reason and an error flag. error: true marks the run as failed.
actions: - choose: - when: "health.system.in_maintenance" sequence: - stop: { reason: "System in maintenance window" } - action: incident.create config: { title: "Real outage", severity: critical }wait_for_trigger
Section titled “wait_for_trigger”Suspends the run until a matching event arrives, with an optional timeout (max 30 days). context_key defaults to the triggering event’s context key, so a wait inside an incident.created run matches the incident.resolved event for the same incident.
actions: - action: incident.create config: { title: "Outage", severity: critical } - wait_for_trigger: event: incident.resolved timeout_seconds: 86400 filter: "trigger.payload.id == artifacts.incident.id" - action: notification.send config: { title: "Resolved within SLA" }wait_until
Section titled “wait_until”Suspends the run until a condition becomes true, woken reactively by a relevant entity change (no polling). The condition counterpart to wait_for_trigger. If the condition is already true when reached, the run continues without suspending.
actions: - action: incident.create config: { title: "{{ trigger.payload.systemName }} down", severity: critical } - wait_until: condition: "health.system.status == 'healthy'" timeout_seconds: 3600 # wait up to 1h continue_on_timeout: true # default; false = fail the run on timeout - action: incident.resolve config: { incidentId: "{{ artifacts.incident.id }}" }Triggers
Section titled “Triggers”A trigger is the entry point. Every trigger has an event; built-in triggers also take config. Optional per-trigger fields: an id (a discriminator for trigger.id in choose clauses), a gating filter expression (a bare expression, no {{ }}), a for: dwell, and a window: rate gate.
Event trigger with filter
Section titled “Event trigger with filter”triggers: - event: healthcheck.system_degraded id: payments_degraded filter: "trigger.payload.systemId == 'payments-api'"for: dwell
Section titled “for: dwell”Fire only if the matched state still holds after a duration. Accepts a single-unit duration ({ seconds }, { minutes }, { hours }) or { template } rendering to seconds. Restart-safe and idempotent - a re-fire while armed preserves the original deadline.
triggers: - event: healthcheck.system_degraded for: { minutes: 30 }window: rate gate
Section titled “window: rate gate”Fire only after this trigger has fired (post-filter) at least count times within the trailing minutes, scoped per context key (e.g. per system). The engine records each qualifying occurrence in a durable append log and counts rows within the sliding window:
refire: every(default) fires on every occurrence at/over the threshold - it re-fires while the window stays over threshold, so debounce in the automation (mode: single+for:) if you want “page once”.refire: oncefires only on the crossing edge (when the count first reachescount) and re-arms naturally as old occurrences age out of the window and the count re-crosses.
The window gate runs AFTER filter (so only qualifying occurrences count) and BEFORE for: (so the two can compose). The count is read from shared Postgres, so it is identical on every pod; the single occurrence INSERT happens on the one pod that claims the emission from the work queue.
triggers: - event: healthcheck.check_failed window: { count: 5, minutes: 10 } # 5 check failures in 10 min, per systemactions: - action: incident.create config: { title: "{{ trigger.payload.systemId }} failing repeatedly", severity: warning }A single automation covers ALL systems: the count is bucketed per partition key and fires per key, so one automation pages independently for every flapping system without enumerating them. By default the partition is the trigger’s built-in context key (systemId for health triggers, incidentId for incident triggers, and so on).
partitionBy
Section titled “partitionBy”partitionBy overrides the dimension the count is bucketed by. It is a bare expression (same flavour as filter - no {{ }}) evaluated against the trigger scope; the result is coerced to a string and used as the partition key.
triggers: - event: healthcheck.check_failed window: count: 5 minutes: 10 partitionBy: trigger.payload.severity # per-severity rate, across all systems- Omitted (default): the trigger’s built-in context key (e.g.
systemId) - one window per system. Existing automations are unaffected. - An explicit equivalent of the default is
partitionBy: trigger.payload.systemId. - A composite key is just an expression:
partitionBy: trigger.payload.systemId + ":" + trigger.payload.checkId(per system-and-check). - Fallback: if the expression evaluates to null / undefined / empty, OR it fails to evaluate, the gate falls back to the built-in context key rather than counting globally (so a typo never collapses every partition into one bucket). Evaluation errors are logged.
numeric_state trigger
Section titled “numeric_state trigger”Fires off a completed health check when a numeric field crosses an above / below threshold. Pair with for: for “above X for Y minutes”. field supports latencyMs, p95LatencyMs, and dotted collector paths like collectors.http.responseTimeMs.
triggers: - event: automation.numeric_state config: field: p95LatencyMs above: 500 for: { minutes: 10 }Flapping detection (windowed transition count)
Section titled “Flapping detection (windowed transition count)”Flapping is just the window: rate gate over the raw healthcheck.system_health_changed change event, filtered to unhealthy transitions. There is no dedicated flapping trigger or hook - healthcheck emits only the raw per-system aggregated-health change, and the engine does the counting. Use refire: once so a flapping system pages on the crossing edge rather than on every subsequent transition.
triggers: - event: healthcheck.system_health_changed id: flapping filter: 'trigger.payload.newStatus != "healthy"' # count unhealthy transitions window: { count: 3, minutes: 60, refire: once } # 3 in 60 min, per systemactions: - action: incident.create config: { title: "{{ trigger.payload.systemId }} is flapping", severity: warning }Conditions
Section titled “Conditions”Conditions are pre-run gates (top-level conditions) or mid-run guards (the condition action). Beyond raw template strings and the and / or / not combinators, three structured variants are available. The raw template string stays the escape hatch for anything they do not cover.
Combinators and expressions
Section titled “Combinators and expressions”conditions: - and: - "health.system.status == 'unhealthy'" - or: - "trigger.payload.severity == 'critical'" - not: "health.system.in_maintenance"numeric_state condition
Section titled “numeric_state condition”Compare a numeric value (a literal, or a template/path resolved against scope) to above / below bounds. With both, the value must fall in the open band between them.
conditions: - numeric_state: value: "health.system.p95_latency_ms" above: 500time condition
Section titled “time condition”On-call / quiet-hours gating. after / before are HH:mm (24h) local to timezone (IANA, defaults to UTC); weekday is a list of 0-6 (Sunday = 0). An after greater than before is an overnight window wrapping midnight.
conditions: - time: after: "22:00" before: "06:00" weekday: [1, 2, 3, 4, 5] timezone: "Europe/Berlin"state condition
Section titled “state condition”True when entity (a catalog system id) is in status, optionally held for at least for. Reads the pre-resolved health.systems[entity].in_status_for_ms - no new timer.
conditions: - state: entity: "payments-api" status: unhealthy for: { minutes: 30 }