Skip to main content
A single trigger — a webhook, a cron fire, a steer message — opens one run. The agent reasons, calls tools, and the run closes when it’s done or when it would otherwise overflow the model’s context window. Most agents finish a run in seconds to a couple of minutes and never need any of the controls on this page. Read on if you have a long incident that issues dozens of tool calls and want to understand what stops it.

The three things that end a run

WhatWhen it firesWhat happens
Run finishedAgent decides it’s doneRun closes cleanly. Diagnosis posted, state checkpointed.
Context fullPrompt approaches the model’s context capAgent ends the run with a checkpoint, the runtime opens a fresh run on the same chain, the new run memory_recalls where it left off.
Budget breachDaily or monthly dollar cap hitRun terminates, budget_breach recorded. Future runs in the next billing window are unaffected.

Continuation chains — when one run isn’t enough

If the agent can’t finish in one run (a forty-tool-call incident on a 200K-token model, for example), it ends the current run with exit_ok: false and a checkpoint_id. The runtime re-enqueues the same event as a continuation; the next run opens with a fresh prompt and pulls the prior findings via memory_recall(checkpoint_id). A chain caps at 10 continuations. The 11th attempt is refused, the originating event is labelled chunk_chain_escalate_human, and an operator picks it up. The agent itself stays alive — only that one chain forfeits.

Knobs you can tune

Set these under x-usezombie.context in TRIGGER.md. Defaults work for almost every workload — leave them alone unless you have a specific reason.
KnobDefaultWhen to touch it
tool_window20Raise if your agent routinely runs 30+ tool calls and you want fewer continuations. Lower if you want it to compact its findings sooner.
memory_checkpoint_every5Cadence at which the agent is prompted to write a memory snapshot during a long run. Rarely worth changing.
context_cap_tokensmodel-dependentSet explicitly if your model has no published context cap or differs from the default.
If you’ve never seen a chunk_chain_escalate_human failure label, you don’t need to change any of these.

Common questions

My agent ran for two minutes — is that normal? Yes. Runs can be seconds (a single webhook → Slack post) to a few minutes (a multi-step diagnosis). What matters is that the run closes cleanly, not how long it took. Will it run forever? No. Three independent ceilings stop it: the run’s own context cap (the agent voluntarily ends and continues), the continuation cap (10 chunks max), and the budget cap (daily_dollars / monthly_dollars in TRIGGER.md). Do I need to tune anything? Almost certainly not. The defaults are tuned for the flagship platform-ops agent and translate well to most workloads. Tune your SKILL.md prose first — it’s where the actual reasoning lives.

See also

  • Memory — the API the agent uses to checkpoint findings during a long run.
  • Budgetsdaily_dollars and monthly_dollars ceilings.
  • Authoring an agent — the full TRIGGER.md reference.