Skip to content

Observability: Correlation IDs and Structured Logging

Every request that flows through a Checkstack oRPC router is tagged with a stable correlation ID and a small set of contextual metadata (plugin ID, user ID where applicable) so that the log lines produced by a single request can be reconstructed end-to-end. This is wired up once, in @checkstack/backend-api, and applies uniformly to every plugin router that uses the standard middleware chain.

The HTTP header x-correlation-id is the single source of truth.

  • If the inbound request carries x-correlation-id, the platform uses that value verbatim. Callers (the React frontend, external scripts, peer services) own the trace and can hand the same ID into their own client-side logs to correlate the round trip.
  • If the header is absent or empty, the platform generates a fresh UUID v4 via crypto.randomUUID(). Handlers MUST NOT mint their own IDs — the middleware is the only generation site.
  • The chosen ID is echoed back on the response under the same x-correlation-id header, so the caller can log it after the fact even if they did not supply one upstream.

The header name is exported as a constant from @checkstack/backend-api so dev tools, integration tests, and fetch wrappers do not hard-code the string:

import { CORRELATION_ID_HEADER } from "@checkstack/backend-api";
// frontend example
const res = await fetch("/api/...", {
headers: { [CORRELATION_ID_HEADER]: crypto.randomUUID() },
});

correlationMiddleware is exported from @checkstack/backend-api and must be applied to every plugin router BEFORE autoAuthMiddleware. The order matters: correlation runs first so that auth failures still log with the correlation ID attached.

import {
autoAuthMiddleware,
correlationMiddleware,
type RpcContext,
} from "@checkstack/backend-api";
import { implement } from "@orpc/server";
const os = implement(myContract)
.$context<RpcContext>()
.use(correlationMiddleware)
.use(autoAuthMiddleware);

The scaffolding template at core/scripts/src/templates/backend/ ships this chain by default, so any plugin generated via bun run create already wires correlation in.

When a request enters a handler, ctx.logger is a child logger with the following fields pre-bound to every log entry it produces:

FieldSourceAlways present?
correlationIdInbound header or UUID v4 generated by mwareYes
pluginIdctx.pluginMetadata.pluginIdYes
userIdctx.user.id (real users + applications)Only when ctx.user has an id

Service users (type: "service") do not have an id and so do not contribute a userId — those calls are still logged with correlationId + pluginId.

Handlers can derive a tighter-scoped logger via .child({ ... }) for sub-operations (jobs, batched work, retries). The child inherits the correlation metadata automatically:

async function importBatch({ context }) {
const log = context.logger.child?.({ batchId: "abc-123" }) ?? context.logger;
log.info("starting import", { itemCount: items.length });
// every line through `log` carries correlationId, pluginId, userId, AND batchId
}

Logger.child is optional on the interface so minimal test mocks do not have to implement it; production Winston loggers always do. Handlers that depend on it should branch on presence and fall back to the base logger when the method is absent.

The Logger methods accept a trailing argument list (...args: unknown[]) so the long-standing varargs callsites — logger.error("…", err) where err is an Error, or logger.info("…", value1, value2) — keep working unchanged. Winston’s splat handling treats:

  • a single trailing Error instance as a special-cased error payload (with stack and message), and
  • a single trailing plain object as structured metadata that gets merged into the log entry.

For NEW code, prefer the structured-metadata shape:

logger.info("imported items", { count, durationMs, source });

Both shapes flow through the same vararg slot in the interface, so no overload churn is needed; the choice is purely stylistic and operational (structured metadata is far easier to grep in a log aggregator).

Once the platform logger ingests the metadata, every entry produced by the request thread carries { correlationId, pluginId, userId? } — including framework-level lines (auth failures, validation errors, queue dispatch). Grepping a log aggregator by correlationId=… reconstructs the request end-to-end across plugins and back-to-back S2S hops.

For correlation across HTTP boundaries (e.g. the frontend that triggered the request), the response echo lets the caller log the ID it actually got, which is then identical to the ID in the server logs.