Skip to main content

Overview

The Agents domain is Worktruck’s in-process agent executor. You register one agent_config per (tenant, blueprint), attach your own Anthropic API key, and POST to /api/v1/agents/{blueprint}/runs to enqueue work. A worker daemon picks the run up via Postgres LISTEN/NOTIFY, drives the LLM ↔ tool loop, and records every step. See the Agents guide for the narrative overview. This page is the reference.

Data Model

Entities

EntityDescription
Agent ConfigPer-(tenant, kind) configuration: system prompt overrides, tool set, budget, deadline, guardrail set, disable state. One row per kind.
Agent RunOne row per enqueue. Carries status, claim state, cost rollup, failure category.
Agent StepContent-hashed step in a run — either an LLM turn or a tool call. Token usage and cost per step.
Agent EventStructured observability row. Guardrail decisions, auto-disable flips, waiting transitions.

Agent kinds

Today, one kind is GA:
KindPurpose
contact_deduperScans contacts for likely duplicates and proposes merges.
New kinds ship as they graduate the internal catalog. Each kind declares its own default system prompt, default tool set, default budget, and default guardrail set.

Integration Keys

Integration keys are service credentials for external APIs that agents call on your behalf — Cloudflare, GitHub, Netdata, Postmark, and Outstand. Unlike BYOK (which is an Anthropic key scoped per agent kind), integration keys are stored once per tenant and shared across all agent kinds that need them.

Supported providers

ProviderUsed by
cloudflaresite_health — zone listing, DNS, analytics
githubAgent tools that read repos or issues
netdatasite_health — infrastructure metrics
postmarkEmail-related agent tools
outstandOutstand integration tools

Storing a key

curl -X POST https://api.worktruck.app/api/v1/integration-keys \
  -H "Authorization: Bearer bsk_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "cloudflare",
    "key": "cf_live_...",
    "label": "Production",
    "metadata": { "account_id": "abc123" }
  }'
Returns 201 on first store, 200 when replacing an existing key (the prior key is atomically revoked). The key field never appears in responses — only a key_hint (last four characters) to confirm which credential is active. Netdata requires metadata.base_url; all other providers take optional metadata.

Key state machine

active ──► invalid (probe failed)
  │           │
  └───────────┴──► revoked (terminal — ciphertext wiped)
active and invalid keys are accessible to agents. revoked keys are audit rows — their ciphertext is gone and they cannot be restored. Calling DELETE /integration-keys/{provider} moves the current key to revoked.

Verifying a key

curl -X POST https://api.worktruck.app/api/v1/integration-keys/cloudflare/verify \
  -H "Authorization: Bearer bsk_live_your_key"
The verify endpoint makes a live probe call to the provider and updates status and last_verified_at. It always returns 200valid: true/false in the body. This is how agents self-heal: a failing verify marks the key invalid, a passing verify marks it active.

Scopes

  • agents:read — list, get
  • agents:write — store, revoke, verify

Apps

Apps are tenant-registered external MCP servers. Integration Keys authorize agents to call known providers (GitHub, Cloudflare, Postmark); Apps let you point an agent at any MCP server you choose — Worktruck’s own MCP endpoint, Context7, Zapier, a bespoke internal tool — without an eng round-trip. Once an app is registered, its tools become available to any agent blueprint configured to consume it.

Registering an app

curl -X POST https://api.worktruck.app/api/v1/apps \
  -H "Authorization: Bearer bsk_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "slug": "context7",
    "display_name": "Context7",
    "description": "Library docs on demand",
    "mcp_server_url": "https://mcp.context7.com/mcp",
    "auth": { "type": "bearer", "token": "sk-live-..." }
  }'
Response: 201 with the persisted app plus the first probe result. Registration always succeeds as long as the URL and auth pass validation — if the probe fails (timeout, bad JSON-RPC, non-2xx), the row is still inserted and marked unhealthy, and the probe error rides back in the response so the UX can surface it. Auth types:
TypeShapeSent as
bearer{ "type": "bearer", "token": "..." }Authorization: Bearer <token>
header{ "type": "header", "name": "X-API-Key", "value": "..." }custom header
none{ "type": "none" }no auth header
Secrets are envelope-encrypted with your tenant’s DEK and never returned by any read endpoint. The auth_hint field on responses carries a redacted preview (e.g. sk-live-***XY) so the UI can show which key is wired without round-tripping the secret.

URL validation

The mcp_server_url must be https:// and must resolve to a public IP. Requests to loopback, RFC 1918, link-local, or any other reserved range are rejected at registration time. The probe path uses the same pinned rustls client Worktruck uses for outbound webhooks.

Probing

Each registered app is probed with a JSON-RPC 2.0 tools/list request against the MCP server. The response shape:
{
  "outcome": "success",
  "discovered_tools": [...],
  "enabled_tools": ["search", "fetch"]
}
On the first probe after registration, every advertised tool is auto-enabled (the enablement set is empty). On subsequent probes, your curated enabled_tools set is preserved — new tools the server advertises stay disabled until you explicitly enable them. Tools that used to exist but are no longer advertised are marked stale in discovered_tools and removed from enabled_tools. Health state machine:
  • active — the last probe succeeded, OR the failure streak is below threshold
  • unhealthy — 5 consecutive probe failures (counters reset on the first success)
  • disabled — operator set status: "disabled" via PATCH; probes skip disabled apps
Trigger an on-demand probe with POST /api/v1/apps/:slug/probe. The endpoint always returns 200 with the probe outcome in the body — never a non-2xx.

Endpoints

MethodPathPurpose
POST/api/v1/appsRegister + probe
GET/api/v1/appsList all apps for the tenant
GET/api/v1/apps/:slugDetail
PATCH/api/v1/apps/:slugUpdate display/description/enabled_tools/status/metadata
POST/api/v1/apps/:slug/auth/rotateReplace the auth secret (no probe — call /probe after)
POST/api/v1/apps/:slug/probeOn-demand probe
DELETE/api/v1/apps/:slugHard delete
slug and mcp_server_url are immutable after registration. To rename, delete and re-register.

Scopes

  • agents:read — list, get
  • agents:write — register, update, rotate, probe, delete

Key Concepts

BYOK (Bring Your Own Key)

Every agent run authenticates to Anthropic with a key you provide. Worktruck encrypts it with your tenant’s DEK (AES-256-GCM) and loads it only at the worker boundary. Rotate with PUT /api/v1/agent-configs/{blueprint}/byok-key. Revoke with DELETE. There is no shared fallback key in production — if the tenant key is missing or revoked, runs fail with auth_failed. Probe a key with POST /api/v1/agent-configs/validate-key before wiring it into a config. The probe is a one-shot live call to Anthropic. The response is either {"status": "valid"} or {"status": "invalid", "category": ..., "message": ...} where category is one of invalid_key, revoked_or_expired, insufficient_permissions, quota_exhausted, provider_unavailable, or rate_limited. The first four are terminal (require operator action); the last two are transient and retry-safe.

Run state machine

queued → running ⇄ waiting

           └──► succeeded | failed | cancelled
StateWho owns the rowNotes
queuedNo workerFreshly inserted by enqueue_run
runningA worker (holds lease)worker_id + lease_expires_at set; renewed every 10s
waitingNo workerAn approval_gate guardrail paused the run. Lease cleared
succeededNo worker (terminal)Final output available; cost_usd_cents rolled up
failedNo worker (terminal)failure_category tells you why
cancelledNo worker (terminal)Operator or API caller killed it
Transitions are enforced at the database — the worker uses SQL UPDATE ... WHERE status = $old to ensure no two processes ever own the same run.

Failure categories

Every terminal failure carries a failure_category:
CategoryMeaning
auth_failedBYOK key was rejected by Anthropic
tool_failedA tool call returned an unrecoverable error
guardrail_blockedAn enforced guardrail rule blocked a call
config_errorThe config is malformed (missing tool, unknown kind, etc.)
timeoutThe deadline elapsed before the run could finish
budget_exhaustedRolled cost exceeded the configured budget cap

Guardrails

Every config carries a GuardrailSet — a list of rules enforced before every tool call. Five primitives:
  • allowlist / denylist — name-based tool gating
  • rate_limit — sliding window per-(run, tool) via Dragonfly
  • approval_gate — pauses the run to waiting on match
  • io_validation — JSON Schema check against tool input
  • quiet_hours — tenant-local time window block
Rules run in shadow (log only) or enforce (block) mode. A run with no guardrail set at all falls back to default-deny: no tools allowed. You can’t accidentally ship a brand-new config with mutating power.

Durable replay

Every step writes to agent_steps keyed by (run_id, seq) with a UNIQUE (tenant_id, run_id, content_hash) constraint. If a worker crashes mid-run, the next one replays every journaled step deterministically, skipping rather than re-executing. The hash is computed over canonical JSON of the step inputs, so replays are bit-identical. Large tool payloads spill out-of-line to agent_step_payloads — the main step row stays small, the payload table stores the bulk.

Cost tracking

Per-step cost is estimated from Anthropic’s published rates (input, output, cache-read, cache-write) and the model used. On terminal, the run’s cost_usd_cents is a sum over all steps. The agent_runs_billing_daily view rolls totals per (tenant_id, agent_blueprint) per day — use it for your own billing dashboards.

Auto-disable circuit breaker

If a (tenant, agent_blueprint) pair accumulates 10 failed runs in 48 hours, the worker stamps agent_configs.disabled_at = now() and writes a disable_reason. New POST /runs calls return 409 Conflict until an operator nulls both columns. There is no automatic recovery — the circuit stays open until human action re-closes it.

Required fields

Minimum config (PUT body for /api/v1/agent-configs/{blueprint} — the blueprint comes from the path, not the body):
{
  "budget_usd_cents": 25,
  "deadline_secs": 300,
  "guardrails": [
    { "kind": "allowlist", "names": ["contacts_search", "contacts_merge"], "mode": "enforce" }
  ]
}
Minimum enqueue:
{ "input": { "max_candidates": 50 } }
Per-run overrides (budget, deadline) are optional and clip to the config defaults.

Multi-tenancy

Every agent table has RLS. Every worker query runs inside a transaction with SET LOCAL app.tenant_id = ... plus a belt-and-suspenders (run_id, tenant_id) check before acting. A bug in the tenancy layer cannot leak rows across tenants — the database enforces it.

Operational notes

  • Runs survive API and worker restarts. queued runs sit in the table until a worker wakes; running runs with expired leases get released to queued by the orphan recovery job
  • The worker exposes an internal health endpoint on :4500 for container healthchecks; it is not reachable over the public internet
  • Deployment is decoupled from the API — the worker is a separate Docker container (worktruck-agent-worker), and you can scale it horizontally without touching the API tier
  • Every run emits OpenTelemetry GenAI semconv spans: agent.run, gen_ai.chat, gen_ai.tool. Ship them to the observability stack of your choice

Limits

  • One GA kind (contact_deduper) as of April 2026 — more land as they graduate
  • Approval resume endpoint is not exposed in Phase 2. waiting runs need operator unblocking; the client-facing resume call lands in Phase 3
  • Mid-run budget action hooks are not exposed; the cap fires only at terminal rollup
  • Model routing is static per-run; automatic multi-model failover is a design-phase item