Overview
The Agents domain is Worktruck’s in-process agent executor. You register oneagent_config per (tenant, blueprint), attach your own Anthropic API key, and POST to /api/v1/agents/{blueprint}/runs to enqueue work. A worker daemon picks the run up via Postgres LISTEN/NOTIFY, drives the LLM ↔ tool loop, and records every step.
See the Agents guide for the narrative overview. This page is the reference.
Data Model
Entities
| Entity | Description |
|---|---|
| Agent Config | Per-(tenant, kind) configuration: system prompt overrides, tool set, budget, deadline, guardrail set, disable state. One row per kind. |
| Agent Run | One row per enqueue. Carries status, claim state, cost rollup, failure category. |
| Agent Step | Content-hashed step in a run — either an LLM turn or a tool call. Token usage and cost per step. |
| Agent Event | Structured observability row. Guardrail decisions, auto-disable flips, waiting transitions. |
Agent kinds
Today, one kind is GA:| Kind | Purpose |
|---|---|
contact_deduper | Scans contacts for likely duplicates and proposes merges. |
Integration Keys
Integration keys are service credentials for external APIs that agents call on your behalf — Cloudflare, GitHub, Netdata, Postmark, and Outstand. Unlike BYOK (which is an Anthropic key scoped per agent kind), integration keys are stored once per tenant and shared across all agent kinds that need them.Supported providers
| Provider | Used by |
|---|---|
cloudflare | site_health — zone listing, DNS, analytics |
github | Agent tools that read repos or issues |
netdata | site_health — infrastructure metrics |
postmark | Email-related agent tools |
outstand | Outstand integration tools |
Storing a key
201 on first store, 200 when replacing an existing key (the prior key is atomically revoked). The key field never appears in responses — only a key_hint (last four characters) to confirm which credential is active.
Netdata requires metadata.base_url; all other providers take optional metadata.
Key state machine
active and invalid keys are accessible to agents. revoked keys are audit rows — their ciphertext is gone and they cannot be restored. Calling DELETE /integration-keys/{provider} moves the current key to revoked.
Verifying a key
status and last_verified_at. It always returns 200 — valid: true/false in the body. This is how agents self-heal: a failing verify marks the key invalid, a passing verify marks it active.
Scopes
agents:read— list, getagents:write— store, revoke, verify
Apps
Apps are tenant-registered external MCP servers. Integration Keys authorize agents to call known providers (GitHub, Cloudflare, Postmark); Apps let you point an agent at any MCP server you choose — Worktruck’s own MCP endpoint, Context7, Zapier, a bespoke internal tool — without an eng round-trip. Once an app is registered, its tools become available to any agent blueprint configured to consume it.Registering an app
201 with the persisted app plus the first probe result. Registration always succeeds as long as the URL and auth pass validation — if the probe fails (timeout, bad JSON-RPC, non-2xx), the row is still inserted and marked unhealthy, and the probe error rides back in the response so the UX can surface it.
Auth types:
| Type | Shape | Sent as |
|---|---|---|
bearer | { "type": "bearer", "token": "..." } | Authorization: Bearer <token> |
header | { "type": "header", "name": "X-API-Key", "value": "..." } | custom header |
none | { "type": "none" } | no auth header |
auth_hint field on responses carries a redacted preview (e.g. sk-live-***XY) so the UI can show which key is wired without round-tripping the secret.
URL validation
Themcp_server_url must be https:// and must resolve to a public IP. Requests to loopback, RFC 1918, link-local, or any other reserved range are rejected at registration time. The probe path uses the same pinned rustls client Worktruck uses for outbound webhooks.
Probing
Each registered app is probed with a JSON-RPC 2.0tools/list request against the MCP server. The response shape:
enabled_tools set is preserved — new tools the server advertises stay disabled until you explicitly enable them. Tools that used to exist but are no longer advertised are marked stale in discovered_tools and removed from enabled_tools.
Health state machine:
active— the last probe succeeded, OR the failure streak is below thresholdunhealthy— 5 consecutive probe failures (counters reset on the first success)disabled— operator setstatus: "disabled"viaPATCH; probes skip disabled apps
POST /api/v1/apps/:slug/probe. The endpoint always returns 200 with the probe outcome in the body — never a non-2xx.
Endpoints
| Method | Path | Purpose |
|---|---|---|
POST | /api/v1/apps | Register + probe |
GET | /api/v1/apps | List all apps for the tenant |
GET | /api/v1/apps/:slug | Detail |
PATCH | /api/v1/apps/:slug | Update display/description/enabled_tools/status/metadata |
POST | /api/v1/apps/:slug/auth/rotate | Replace the auth secret (no probe — call /probe after) |
POST | /api/v1/apps/:slug/probe | On-demand probe |
DELETE | /api/v1/apps/:slug | Hard delete |
slug and mcp_server_url are immutable after registration. To rename, delete and re-register.
Scopes
agents:read— list, getagents:write— register, update, rotate, probe, delete
Key Concepts
BYOK (Bring Your Own Key)
Every agent run authenticates to Anthropic with a key you provide. Worktruck encrypts it with your tenant’s DEK (AES-256-GCM) and loads it only at the worker boundary. Rotate withPUT /api/v1/agent-configs/{blueprint}/byok-key. Revoke with DELETE. There is no shared fallback key in production — if the tenant key is missing or revoked, runs fail with auth_failed.
Probe a key with POST /api/v1/agent-configs/validate-key before wiring it into a config. The probe is a one-shot live call to Anthropic. The response is either {"status": "valid"} or {"status": "invalid", "category": ..., "message": ...} where category is one of invalid_key, revoked_or_expired, insufficient_permissions, quota_exhausted, provider_unavailable, or rate_limited. The first four are terminal (require operator action); the last two are transient and retry-safe.
Run state machine
| State | Who owns the row | Notes |
|---|---|---|
queued | No worker | Freshly inserted by enqueue_run |
running | A worker (holds lease) | worker_id + lease_expires_at set; renewed every 10s |
waiting | No worker | An approval_gate guardrail paused the run. Lease cleared |
succeeded | No worker (terminal) | Final output available; cost_usd_cents rolled up |
failed | No worker (terminal) | failure_category tells you why |
cancelled | No worker (terminal) | Operator or API caller killed it |
UPDATE ... WHERE status = $old to ensure no two processes ever own the same run.
Failure categories
Every terminal failure carries afailure_category:
| Category | Meaning |
|---|---|
auth_failed | BYOK key was rejected by Anthropic |
tool_failed | A tool call returned an unrecoverable error |
guardrail_blocked | An enforced guardrail rule blocked a call |
config_error | The config is malformed (missing tool, unknown kind, etc.) |
timeout | The deadline elapsed before the run could finish |
budget_exhausted | Rolled cost exceeded the configured budget cap |
Guardrails
Every config carries aGuardrailSet — a list of rules enforced before every tool call. Five primitives:
allowlist/denylist— name-based tool gatingrate_limit— sliding window per-(run, tool) via Dragonflyapproval_gate— pauses the run towaitingon matchio_validation— JSON Schema check against tool inputquiet_hours— tenant-local time window block
Durable replay
Every step writes toagent_steps keyed by (run_id, seq) with a UNIQUE (tenant_id, run_id, content_hash) constraint. If a worker crashes mid-run, the next one replays every journaled step deterministically, skipping rather than re-executing. The hash is computed over canonical JSON of the step inputs, so replays are bit-identical.
Large tool payloads spill out-of-line to agent_step_payloads — the main step row stays small, the payload table stores the bulk.
Cost tracking
Per-step cost is estimated from Anthropic’s published rates (input, output, cache-read, cache-write) and the model used. On terminal, the run’scost_usd_cents is a sum over all steps. The agent_runs_billing_daily view rolls totals per (tenant_id, agent_blueprint) per day — use it for your own billing dashboards.
Auto-disable circuit breaker
If a(tenant, agent_blueprint) pair accumulates 10 failed runs in 48 hours, the worker stamps agent_configs.disabled_at = now() and writes a disable_reason. New POST /runs calls return 409 Conflict until an operator nulls both columns. There is no automatic recovery — the circuit stays open until human action re-closes it.
Required fields
Minimum config (PUT body for/api/v1/agent-configs/{blueprint} — the blueprint comes from the path, not the body):
Multi-tenancy
Every agent table has RLS. Every worker query runs inside a transaction withSET LOCAL app.tenant_id = ... plus a belt-and-suspenders (run_id, tenant_id) check before acting. A bug in the tenancy layer cannot leak rows across tenants — the database enforces it.
Operational notes
- Runs survive API and worker restarts.
queuedruns sit in the table until a worker wakes;runningruns with expired leases get released toqueuedby the orphan recovery job - The worker exposes an internal health endpoint on
:4500for container healthchecks; it is not reachable over the public internet - Deployment is decoupled from the API — the worker is a separate Docker container (
worktruck-agent-worker), and you can scale it horizontally without touching the API tier - Every run emits OpenTelemetry GenAI semconv spans:
agent.run,gen_ai.chat,gen_ai.tool. Ship them to the observability stack of your choice
Limits
- One GA kind (
contact_deduper) as of April 2026 — more land as they graduate - Approval resume endpoint is not exposed in Phase 2.
waitingruns need operator unblocking; the client-facing resume call lands in Phase 3 - Mid-run budget action hooks are not exposed; the cap fires only at terminal rollup
- Model routing is static per-run; automatic multi-model failover is a design-phase item
