Skip to main content
Worktruck runs agents as a first-class service. You register a config once, POST to an endpoint, and an agent worker picks it up, calls your data, and records every step. Runs are journaled — a worker restart doesn’t lose the work done so far. The agent executor is not a framework. You don’t bundle a Python runtime or write a LangChain graph. You configure a kind, enqueue a run, and collect the result.

When to use this

Reach for a Worktruck agent run when:
  • You want a task to execute against your Worktruck data (contacts, CRM, tasks, calendar, notes) without standing up your own inference stack
  • The work is bounded: a dedupe pass, a follow-up generation, a weekly cleanup, a daily briefing — something that starts, does a thing, and stops
  • You need the work to survive process restarts and mid-run failures
  • You want per-run cost caps, guardrails on which tools can run, and structured audit trails
Reach for the REST API or MCP server instead when:
  • You already have an agent runner (Claude Code, Cursor, a custom loop). In that case, connect it to the MCP server and skip the agent executor — your runner orchestrates, Worktruck serves data.
  • You want open-ended conversational access with the human in the loop.

Available agent kinds

Today, one kind is generally available:
KindWhat it does
contact_deduperScans your contacts for likely duplicates (email/phone/name collisions) and proposes merges.
Additional kinds land as they graduate the internal catalog.

Integration keys — credentials for external services

Some agent kinds call external APIs on your behalf — Cloudflare, GitHub, Netdata, Postmark, Outstand. You store one credential per provider once; agents load it automatically when they need it.
curl -X POST https://api.worktruck.app/api/v1/integration-keys \
  -H "Authorization: Bearer bsk_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "cloudflare",
    "key": "cf_live_...",
    "metadata": { "account_id": "abc123" }
  }'
Keys are encrypted with your tenant’s DEK. Responses never expose the raw key — only a key_hint (last four characters). Use POST /integration-keys/{provider}/verify to probe the credential against the real provider and update its active / invalid status. The site_health agent kind reads Cloudflare and Netdata integration keys at run time. If an integration key is not stored, tools that require it fail with auth_failed and the key prompt in the error message.

Your keys run the show (BYOK)

Agent runs use your Anthropic API key. Worktruck never charges you for LLM tokens — the cost lands on your Anthropic invoice, not ours. You attach a key once per blueprint with a PUT /api/v1/agent-configs/{blueprint}/byok-key call; Worktruck encrypts it with your tenant DEK and uses it on every run. Rotate or revoke any time. A probe call verifies the key works before you enqueue a real run:
curl -X POST https://api.worktruck.app/api/v1/agent-configs/validate-key \
  -H "Authorization: Bearer bsk_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{"api_key": "sk-ant-api03-..."}'
The response is a ProbeResult — either {"status": "valid"} or {"status": "invalid", "category": ..., "message": ...}. The category is one of invalid_key, revoked_or_expired, insufficient_permissions, quota_exhausted (terminal — require action) or provider_unavailable, rate_limited (transient — safe to retry).

Run lifecycle

queued ──► running ──► (succeeded | failed | cancelled)

              └──► waiting ──► running ──► ...
StatusMeaning
queuedYour POST /runs call accepted the work; no worker has claimed it yet
runningA worker holds a 30-second lease and is driving the executor loop
waitingA guardrail paused the run for approval (see below). Lease is cleared; no worker is spending time on it
succeededTerminal. Final output is available; cost was rolled up
failedTerminal. failure_category explains why (auth_failed, tool_failed, guardrail_blocked, config_error, timeout, budget_exhausted)
cancelledTerminal. Operator or API caller killed the run via POST /runs/{id}/cancel
Every state transition is auditable. Every tool call is journaled. Every LLM call has cost attached.

Enqueueing a run

curl -X POST https://api.worktruck.app/api/v1/agents/contact_deduper/runs \
  -H "Authorization: Bearer bsk_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "max_candidates": 50
    },
    "budget_usd_cents": 25,
    "deadline_secs": 300
  }'
Response:
{
  "id": "0194a8b3-7e5a-77c1-9d0f-7c1a2f3b4c5d",
  "agent_blueprint": "contact_deduper",
  "status": "queued",
  "created_at": "2026-04-14T18:32:10Z"
}
The POST returns immediately. A worker wakes via Postgres LISTEN/NOTIFY within milliseconds and starts the run. Poll GET /runs/{id} for status or set up a webhook.

Inspecting a run

curl https://api.worktruck.app/api/v1/agents/contact_deduper/runs/<id> \
  -H "Authorization: Bearer bsk_live_your_key"
The detail payload includes every step (LLM turn or tool call), token usage, cost, and the final output. Large tool payloads are stored out-of-line and retrieved by step ID.

Guardrails

Every agent config carries a guardrail set — a list of rules the worker enforces before each tool call. Six primitives:
RulePurpose
allowlistOnly listed tools may run
denylistListed tools may never run
rate_limitSliding-window cap per tool per run
approval_gateMatching call pauses the run to waiting for human approval
io_validationJSON Schema check on tool input before the call lands
quiet_hoursBlock tool calls outside a tenant-local time window
Rules run in one of two modes:
  • shadow — the evaluator logs its decision but doesn’t block. Use this to test a new rule against live traffic.
  • enforce — block or suspend as the rule specifies.
If you leave the guardrail set empty, Worktruck falls back to default-deny: no tools can run. This is intentional — a brand-new config can’t accidentally hit mutating endpoints on day one.

Cost tracking and auto-disable

Every step’s cost (in USD cents) is recorded on agent_steps. At terminal, the run row gets a rolled-up cost_usd_cents. You can pull daily totals per (tenant, agent_blueprint) from the billing view for your own dashboards. On the operational side, Worktruck watches for runaway agents. If a (tenant, agent_blueprint) pair accumulates 10 failed runs in 48 hours, the config auto-disables: disabled_at gets stamped and future POST /runs calls return 409 Conflict with a reason. Manual operator action is required to re-enable — no self-healing. This is the runaway-agent circuit breaker.

What’s in a step

Each agent_step is content-hashed. If a worker crashes mid-run and another picks it up, the replay logic skips every step whose hash matches a prior record and resumes from the first un-journaled point. You don’t rebuild state by hand. The database is the state.

Limits and deferred features

  • One run type at a time per kind. Keep your configs focused.
  • Approval resume endpoint is not yet exposed. waiting runs currently need operator intervention to unblock. This lands in Phase 3.
  • Mid-run budget action hooks are not exposed. Configs still honor budget_usd_cents as a terminal cap — if the rolled total exceeds the cap, the run fails with budget_exhausted.
  • Model routing is static. The executor talks to Anthropic via the key you provide. Multi-model failover is on the roadmap under the model-failover.md design doc — contact us if you need it.

Next steps