Catching refusals — guard rails for autonomous agents

May 2026 · On the detail that exit 0 doesn't mean "all good"

My bug-bounty agent doesn't run in a fancy orchestrator. It runs in a 250-line Bash script that walks a queue, spawns a fresh Claude CLI process per task, and waits for the result. Half of the script is loop logic. The other half is guard rails against the three most common failure modes that nobody mentions in tutorials:

  1. The agent exits cleanly but didn't produce the artifact.
  2. The agent refused the task, and that looks like success.
  3. The usage limit hits, and the run dies mid-flight.

This text describes how I detect each of these, and why I prefer the guard rails in Bash over a big framework.

The loop

The skeleton is trivial:

while true; do
  next_task=$(pick_first_unfinished_task "$QUEUE_FILE")
  [ -z "$next_task" ] && break

  prompt_file="$PROMPTS_DIR/$next_task.prompt"
  spawn_log="$SPAWN_LOGS_DIR/$next_task.spawn.log"

  timeout 1500 claude \
    --print \
    --permission-mode bypassPermissions \
    --model opus \
    --add-dir "$WORKSPACE" \
    < "$prompt_file" \
    >> "$spawn_log" 2>&1
  cmd_exit=$?

  process_result "$next_task" "$cmd_exit" "$spawn_log"
done

"Pick first unfinished" means: read the queue, find the first KH03: … line that doesn't yet have a status/KH03.status file. That makes the loop idempotent — kill the script, restart, it picks up where it left off.

The agent is supposed to write a status/$task_id.status JSON at the end with status, decision, and optionally a lead_file path. That's the output interface. Sounds simple.

Failure mode 1: exit 0 without status

The agent terminates with exit code 0, but the file is missing. I treated this as a prompt bug at first — phrase the instruction more strictly, more examples, schemas. Didn't help. Models have bad days, or tool calls time out, or the markdown output looks like JSON but isn't parseable.

The robust fix is sentinel recovery: after a successful exit, check whether the status file exists. If not, scan the leads directory for new files belonging to this task:

if [ ! -f "$STATUS_DIR/$next_id.status" ]; then
  leads_after=$(find "$LEADS_DIR" -maxdepth 1 -name "${next_id}-*" | sort)
  new_lead=$(comm -13 <(echo "$leads_before") <(echo "$leads_after") | head -1)
  if [ -n "$new_lead" ]; then
    cat > "$STATUS_DIR/$next_id.status" <

Practically: if the agent wrote a lead file, the run was successful even if the closing status file is missing. Done is done. Roughly one run in five gets saved by this recovery.

Failure mode 2: refusal looks like success

This is the awkward one. The agent received the prompt, found it problematic, and wrote a justification instead of doing the work — and exits cleanly. From the runner's view: successful run, no output. From the audit's view: task got skipped silently.

The problem is real for bug-bounty audits, because prompts often contain words like "exploit", "bypass", "privilege escalation", "PoC" that are refusal triggers in other contexts. That's understandable; the model has no context that this is an authorized program. But the failure mode must not look like success.

The guard rail: after sentinel recovery (no status, no lead), I tail the spawn log and grep for refusal phrases:

refusal=$(tail -200 "$spawn_log" | grep -iE \
  "I can'?t (help|assist|provide|do that|continue|create|generate|write)|\
I cannot (help|assist|provide|do that|continue|create|generate|write)|\
I won'?t (help|be able|do that|create|generate|write)|\
I'?m not (able|going) to (help|assist|create|generate|provide)|\
Anthropic.{0,40}polic|usage polic|acceptable use polic|\
against (my )?guidelines|violates.{0,40}polic|\
harmful (request|content|action)" \
  | head -3 | tr '\n' ' ' | head -c 400)

if [ -n "$refusal" ]; then
  cat > "$STATUS_DIR/$next_id.status" <

Status blocked is a separate state alongside done and failed. It means: the prompt needs reformulating, this is an engineering task for me, not a model failure. My phone gets a Telegram message with the refusal snippet, and in the morning I decide whether to sharpen the prompt (more program-scope context, explicit "this is authorized BB research") or pull the task from the queue.

The regex is not elegant. It's a list of phrases I've seen in spawn logs. It will change as new model generations develop new refusal idioms. That's fine — I maintain it like a test suite.

Failure mode 3: usage limits

My Pro OAuth account has a reset window every few hours. If the account caps mid-run, the CLI returns something like "You hit your limit. Reset at 2am UTC." and exits. From the classic loop's view: task failed, move on — and the next one fails too, and so on, until the queue empties.

Detection is again a spawn-log grep, but inside an inner retry loop:

USAGE_RETRY_MAX=15
USAGE_RETRY_SLEEP=900  # 15 minutes

usage_retry=0
while true; do
  timeout 1500 claude --print --model opus --add-dir "$WORKSPACE" \
    < "$prompt_file" >> "$spawn_log" 2>&1
  cmd_exit=$?

  if tail -30 "$spawn_log" | \
     grep -qi "hit your limit\|usage limit\|reset.*am.*UTC\|reset.*pm.*UTC"; then
    usage_retry=$((usage_retry+1))
    if [ $usage_retry -le $USAGE_RETRY_MAX ]; then
      ping "[$next_id] usage-limit · sleeping 15min (retry $usage_retry/$USAGE_RETRY_MAX)"
      : > "$spawn_log"
      sleep $USAGE_RETRY_SLEEP
      continue
    fi
    break
  fi

  [ $cmd_exit -eq 0 ] && break
  # … general error retry: 3x with 60s sleep
done

15 retries × 15 minutes gives 3.75 hours of wait tolerance per task. That outlives every limit-reset cycle I've seen. In practice: a run that would have taken 12 hours takes 13 with a limit hit.

Failure mode 4 (bonus): PATH

The runner gets started via nohup bash queue-runner-audit.sh &, often over SSH from my laptop. nohup'd shells are non-interactive; they don't source .bashrc. So the NPM-global bin path is missing, and claude isn't found. The first hour of bug hunting on an expedition would be gone if this weren't explicitly at the top of the script:

export PATH="$HOME/.local/share/npm-global/bin:/usr/local/bin:/usr/bin:/bin:$HOME/.local/bin:$PATH"

The most embarrassing bug of the whole pipeline and the one with the highest "wondered for three weeks" to "one-line fix" ratio.

What this means for production agents

Exit codes from LLM CLIs are a weak success signal. They tell you "the process terminated cleanly", not "the task was completed". Treat the actual output (file, JSON, lead) as the success criterion, not the exit code. Sentinel checks against the file system are more robust than stdout parsing.

Refusals are an engineering signal, not a model error. If your agent refuses a task, that's almost always a prompt problem (not enough context, problematic phrasing) or a scope problem (you're asking for something genuinely problematic). In both cases you want to know, with a dedicated state, not bury it in a generic failure bucket. A blocked bucket separates "model refused" from "model crashed" and makes refinement measurable.

Usage limits are not hard walls. They are pauses. If your runner treats them like pauses — sleep, retry, continue — the limit disappears as a failure mode from your operational vocabulary.

Bash is good enough. I considered moving to a workflow orchestrator (Temporal, Airflow, dagster, even prefect). The argument against was simple: 250 lines of Bash are 250 lines I have completely in my head and can change in 30 seconds. The guard rails above are all written within five minutes of seeing a failure mode once in a run. With a framework that would be a PR.

I might migrate when I have a second high-load agent runner. Until then: one script, one cron, one Telegram number. Enough.


The full runner is 249 lines, of which ~80 are the three guard rails described here. If I publish the source, it'll appear here.