Agents

Overview

The Agents domain is Worktruck’s in-process agent executor. You register one agent_config per (tenant, blueprint), attach your own Anthropic API key, and POST to /api/v1/agents/{blueprint}/runs to enqueue work. A worker daemon picks the run up via Postgres LISTEN/NOTIFY, drives the LLM ↔ tool loop, and records every step. See the Agents guide for the narrative overview. This page is the reference.

Data Model

Entities

Entity	Description
Agent Config	Per-`(tenant, kind)` configuration: system prompt overrides, tool set, budget, deadline, guardrail set, disable state. One row per kind.
Agent Run	One row per enqueue. Carries status, claim state, cost rollup, failure category.
Agent Step	Content-hashed step in a run — either an LLM turn or a tool call. Token usage and cost per step.
Agent Event	Structured observability row. Guardrail decisions, auto-disable flips, waiting transitions.

Agent kinds

Today, one kind is GA:

Kind	Purpose
`contact_deduper`	Scans contacts for likely duplicates and proposes merges.

New kinds ship as they graduate the internal catalog. Each kind declares its own default system prompt, default tool set, default budget, and default guardrail set.

Integration Keys

Integration keys are service credentials for external APIs that agents call on your behalf — Cloudflare, GitHub, Netdata, Postmark, and Outstand. Unlike BYOK (which is an Anthropic key scoped per agent kind), integration keys are stored once per tenant and shared across all agent kinds that need them.

Supported providers

Provider	Used by
`cloudflare`	`site_health` — zone listing, DNS, analytics
`github`	Agent tools that read repos or issues
`netdata`	`site_health` — infrastructure metrics
`postmark`	Email-related agent tools
`outstand`	Outstand integration tools

Storing a key

curl -X POST https://api.worktruck.app/api/v1/integration-keys \
  -H "Authorization: Bearer bsk_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "cloudflare",
    "key": "cf_live_...",
    "label": "Production",
    "metadata": { "account_id": "abc123" }
  }'

Returns 201 on first store, 200 when replacing an existing key (the prior key is atomically revoked). The key field never appears in responses — only a key_hint (last four characters) to confirm which credential is active. Netdata requires metadata.base_url; all other providers take optional metadata.

Key state machine

active ──► invalid (probe failed)
  │           │
  └───────────┴──► revoked (terminal — ciphertext wiped)

active and invalid keys are accessible to agents. revoked keys are audit rows — their ciphertext is gone and they cannot be restored. Calling DELETE /integration-keys/{provider} moves the current key to revoked.

Verifying a key

curl -X POST https://api.worktruck.app/api/v1/integration-keys/cloudflare/verify \
  -H "Authorization: Bearer bsk_live_your_key"

The verify endpoint makes a live probe call to the provider and updates status and last_verified_at. It always returns 200 — valid: true/false in the body. This is how agents self-heal: a failing verify marks the key invalid, a passing verify marks it active.

Scopes

agents:read — list, get
agents:write — store, revoke, verify

Apps

Apps are tenant-registered external MCP servers. Integration Keys authorize agents to call known providers (GitHub, Cloudflare, Postmark); Apps let you point an agent at any MCP server you choose — Worktruck’s own MCP endpoint, Context7, Zapier, a bespoke internal tool — without an eng round-trip. Once an app is registered, its tools become available to any agent blueprint configured to consume it.

Registering an app

curl -X POST https://api.worktruck.app/api/v1/apps \
  -H "Authorization: Bearer bsk_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "slug": "context7",
    "display_name": "Context7",
    "description": "Library docs on demand",
    "mcp_server_url": "https://mcp.context7.com/mcp",
    "auth": { "type": "bearer", "token": "sk-live-..." }
  }'

Response: 201 with the persisted app plus the first probe result. Registration always succeeds as long as the URL and auth pass validation — if the probe fails (timeout, bad JSON-RPC, non-2xx), the row is still inserted and marked unhealthy, and the probe error rides back in the response so the UX can surface it. Auth types:

Type	Shape	Sent as
`bearer`	`{ "type": "bearer", "token": "..." }`	`Authorization: Bearer <token>`
`header`	`{ "type": "header", "name": "X-API-Key", "value": "..." }`	custom header
`none`	`{ "type": "none" }`	no auth header

Secrets are envelope-encrypted with your tenant’s DEK and never returned by any read endpoint. The auth_hint field on responses carries a redacted preview (e.g. sk-live-***XY) so the UI can show which key is wired without round-tripping the secret.

URL validation

The mcp_server_url must be https:// and must resolve to a public IP. Requests to loopback, RFC 1918, link-local, or any other reserved range are rejected at registration time. The probe path uses the same pinned rustls client Worktruck uses for outbound webhooks.

Probing

Each registered app is probed with a JSON-RPC 2.0 tools/list request against the MCP server. The response shape:

{
  "outcome": "success",
  "discovered_tools": [...],
  "enabled_tools": ["search", "fetch"]
}

On the first probe after registration, every advertised tool is auto-enabled (the enablement set is empty). On subsequent probes, your curated enabled_tools set is preserved — new tools the server advertises stay disabled until you explicitly enable them. Tools that used to exist but are no longer advertised are marked stale in discovered_tools and removed from enabled_tools. Health state machine:

active — the last probe succeeded, OR the failure streak is below threshold
unhealthy — 5 consecutive probe failures (counters reset on the first success)
disabled — operator set status: "disabled" via PATCH; probes skip disabled apps

Trigger an on-demand probe with POST /api/v1/apps/:slug/probe. The endpoint always returns 200 with the probe outcome in the body — never a non-2xx.

Endpoints

Method	Path	Purpose
`POST`	`/api/v1/apps`	Register + probe
`GET`	`/api/v1/apps`	List all apps for the tenant
`GET`	`/api/v1/apps/:slug`	Detail
`PATCH`	`/api/v1/apps/:slug`	Update display/description/enabled_tools/status/metadata
`POST`	`/api/v1/apps/:slug/auth/rotate`	Replace the auth secret (no probe — call `/probe` after)
`POST`	`/api/v1/apps/:slug/probe`	On-demand probe
`DELETE`	`/api/v1/apps/:slug`	Hard delete

slug and mcp_server_url are immutable after registration. To rename, delete and re-register.

Scopes

agents:read — list, get
agents:write — register, update, rotate, probe, delete

Key Concepts

BYOK (Bring Your Own Key)

Every agent run authenticates to Anthropic with a key you provide. Worktruck encrypts it with your tenant’s DEK (AES-256-GCM) and loads it only at the worker boundary. Rotate with PUT /api/v1/agent-configs/{blueprint}/byok-key. Revoke with DELETE. There is no shared fallback key in production — if the tenant key is missing or revoked, runs fail with auth_failed. Probe a key with POST /api/v1/agent-configs/validate-key before wiring it into a config. The probe is a one-shot live call to Anthropic. The response is either {"status": "valid"} or {"status": "invalid", "category": ..., "message": ...} where category is one of invalid_key, revoked_or_expired, insufficient_permissions, quota_exhausted, provider_unavailable, or rate_limited. The first four are terminal (require operator action); the last two are transient and retry-safe.

Run state machine

queued → running ⇄ waiting
           │
           └──► succeeded | failed | cancelled

State	Who owns the row	Notes
`queued`	No worker	Freshly inserted by `enqueue_run`
`running`	A worker (holds lease)	`worker_id` + `lease_expires_at` set; renewed every 10s
`waiting`	No worker	An `approval_gate` guardrail paused the run. Lease cleared
`succeeded`	No worker (terminal)	Final output available; `cost_usd_cents` rolled up
`failed`	No worker (terminal)	`failure_category` tells you why
`cancelled`	No worker (terminal)	Operator or API caller killed it

Transitions are enforced at the database — the worker uses SQL UPDATE ... WHERE status = $old to ensure no two processes ever own the same run.

Failure categories

Every terminal failure carries a failure_category:

Category	Meaning
`auth_failed`	BYOK key was rejected by Anthropic
`tool_failed`	A tool call returned an unrecoverable error
`guardrail_blocked`	An enforced guardrail rule blocked a call
`config_error`	The config is malformed (missing tool, unknown kind, etc.)
`timeout`	The deadline elapsed before the run could finish
`budget_exhausted`	Rolled cost exceeded the configured budget cap

Guardrails

Every config carries a GuardrailSet — a list of rules enforced before every tool call. Five primitives:

allowlist / denylist — name-based tool gating
rate_limit — sliding window per-(run, tool) via Dragonfly
approval_gate — pauses the run to waiting on match
io_validation — JSON Schema check against tool input
quiet_hours — tenant-local time window block

Rules run in shadow (log only) or enforce (block) mode. A run with no guardrail set at all falls back to default-deny: no tools allowed. You can’t accidentally ship a brand-new config with mutating power.

Durable replay

Every step writes to agent_steps keyed by (run_id, seq) with a UNIQUE (tenant_id, run_id, content_hash) constraint. If a worker crashes mid-run, the next one replays every journaled step deterministically, skipping rather than re-executing. The hash is computed over canonical JSON of the step inputs, so replays are bit-identical. Large tool payloads spill out-of-line to agent_step_payloads — the main step row stays small, the payload table stores the bulk.

Cost tracking

Per-step cost is estimated from Anthropic’s published rates (input, output, cache-read, cache-write) and the model used. On terminal, the run’s cost_usd_cents is a sum over all steps. The agent_runs_billing_daily view rolls totals per (tenant_id, agent_blueprint) per day — use it for your own billing dashboards.

Auto-disable circuit breaker

If a (tenant, agent_blueprint) pair accumulates 10 failed runs in 48 hours, the worker stamps agent_configs.disabled_at = now() and writes a disable_reason. New POST /runs calls return 409 Conflict until an operator nulls both columns. There is no automatic recovery — the circuit stays open until human action re-closes it.

Required fields

Minimum config (PUT body for /api/v1/agent-configs/{blueprint} — the blueprint comes from the path, not the body):

{
  "budget_usd_cents": 25,
  "deadline_secs": 300,
  "guardrails": [
    { "kind": "allowlist", "names": ["contacts_search", "contacts_merge"], "mode": "enforce" }
  ]
}

Minimum enqueue:

{ "input": { "max_candidates": 50 } }

Per-run overrides (budget, deadline) are optional and clip to the config defaults.

Multi-tenancy

Every agent table has RLS. Every worker query runs inside a transaction with SET LOCAL app.tenant_id = ... plus a belt-and-suspenders (run_id, tenant_id) check before acting. A bug in the tenancy layer cannot leak rows across tenants — the database enforces it.

Operational notes

Runs survive API and worker restarts. queued runs sit in the table until a worker wakes; running runs with expired leases get released to queued by the orphan recovery job
The worker exposes an internal health endpoint on :4500 for container healthchecks; it is not reachable over the public internet
Deployment is decoupled from the API — the worker is a separate Docker container (worktruck-agent-worker), and you can scale it horizontally without touching the API tier
Every run emits OpenTelemetry GenAI semconv spans: agent.run, gen_ai.chat, gen_ai.tool. Ship them to the observability stack of your choice

Limits

One GA kind (contact_deduper) as of April 2026 — more land as they graduate
Approval resume endpoint is not exposed in Phase 2. waiting runs need operator unblocking; the client-facing resume call lands in Phase 3
Mid-run budget action hooks are not exposed; the cap fires only at terminal rollup
Model routing is static per-run; automatic multi-model failover is a design-phase item

Getting Started

Concepts

Domains

Payments

Overview

Data Model

Entities

Agent kinds

Integration Keys

Supported providers

Storing a key

Key state machine

Verifying a key

Scopes

Apps

Registering an app

URL validation

Probing

Endpoints

Scopes

Key Concepts

BYOK (Bring Your Own Key)

Run state machine

Failure categories

Guardrails

Durable replay

Cost tracking

Auto-disable circuit breaker

Required fields

Multi-tenancy

Operational notes

Limits

Getting Started

Concepts

Domains

Payments

​Overview

​Data Model

​Entities

​Agent kinds

​Integration Keys

​Supported providers

​Storing a key

​Key state machine

​Verifying a key

​Scopes

​Apps

​Registering an app

​URL validation

​Probing

​Endpoints

​Scopes

​Key Concepts

​BYOK (Bring Your Own Key)

​Run state machine

​Failure categories

​Guardrails

​Durable replay

​Cost tracking

​Auto-disable circuit breaker

​Required fields

​Multi-tenancy

​Operational notes

​Limits

Overview

Data Model

Entities

Agent kinds

Integration Keys

Supported providers

Storing a key

Key state machine

Verifying a key

Scopes

Apps

Registering an app

URL validation

Probing

Endpoints

Scopes

Key Concepts

BYOK (Bring Your Own Key)

Run state machine

Failure categories

Guardrails

Durable replay

Cost tracking

Auto-disable circuit breaker

Required fields

Multi-tenancy

Operational notes

Limits