How to Run Claude Code Subagents That Actually Work: The Queue Pattern
Three concurrent agents, one shared queue file, and four rules that prevent the mistakes that make multi-agent Claude Code setups fail — too many agents, no coordination, no state tracking, no timeouts.
Last updated: June 15, 2026
Three concurrent agents, one shared queue file, and four rules that prevent the mistakes that make multi-agent Claude Code setups fail — too many agents, no coordination, no state tracking, no timeouts.
This guide is reviewed for clarity, service accuracy, and AI-search readability. The next quarterly content review is tracked internally before unsupported metrics or client proof are added.
Why most multi-agent setups break
Spawning multiple Claude Code agents sounds straightforward. In practice, most setups fail for the same four reasons: too many agents running at once (context pressure collapses quality), no shared coordination layer (agents repeat work or conflict), no state tracking (you can't tell what's done or in progress), and no timeouts (one stuck agent blocks the whole pipeline).
The pattern below solves all four. It's not complex — one queue file, three agents, and a few rules in your CLAUDE.md.
The four rules
- Cap at 3 concurrent agents. More than 3 creates context pressure that degrades output quality. Three is the number where parallel throughput outpaces the quality cost.
- Use a shared queue file. All coordination happens through a single JSON file. One file is the source of truth.
- Track state per task. Every task needs one of four states: ,prompt
pending,promptin_progress,promptcompleted. Without state, you can't restart safely and you can't tell when the work is done.promptfailed - Set timeouts and retry limits. Every task gets a timeout. Failed tasks retry with exponential backoff (1s, 2s, 4s). After two retries, escalate to a human.
The CLAUDE.md subagents block
Add this block to your CLAUDE.md. It tells Claude exactly how to behave when spawning and managing subagents — without it, the agent makes its own decisions about concurrency and retry logic, which usually means neither.
text## SUBAGENTS Max concurrent agents: 3 Default timeout: 30 minutes Retry policy: exponential backoff (1s, 2s, 4s, 8s) Max retries per task: 2 Task types: refunds, support, data, reporting
The queue file structure
The queue lives at a fixed path — typically
task_queue.jsonjson[ { "task_id": "task_001", "type": "refund", "status": "pending", "assigned_agent": null, "retries": 0, "result": null, "data": { "order_id": "ORD-123", "amount": 49.99 } }, { "task_id": "task_002", "type": "support", "status": "in_progress", "assigned_agent": "agent_1", "retries": 0, "result": null, "data": { "ticket_id": "TKT-456", "issue": "login failure" } } ]
The subagent template
Each subagent lives in
.claude/agents/<name>.mdmarkdown--- name: refund-processor description: Processes customer refund requests from the task queue tools: - Read - Write - Bash model: claude-haiku-4-5-20251001 --- You are a refund processing agent. Your job: 1. Read your assigned task from task_queue.json 2. Process the refund via the payment API 3. Update task status to completed or failed 4. Write the result back to the queue Never process a task that is not assigned to you. Never modify tasks assigned to other agents.
How the parent agent spawns and monitors
The parent agent reads the queue, claims up to 3 pending tasks by setting their status to
in_progressThe key invariant: the parent never modifies tasks it didn't claim. If an agent crashes mid-task, the task stays in
in_progresspendingWhat this looks like at scale
A real example: an e-commerce team processing refund requests. Before the queue pattern, one agent handled requests sequentially — about 10 refunds per hour. With three subagents running in parallel from a shared queue, throughput hit 180 refunds per hour. Error rate dropped 97% because each agent only handles its assigned tasks and state transitions are explicit.
The math: 3 agents × 60 refunds/hour each = 180/hour.
Five things to monitor
- Queue depth — how many tasks are pending. A growing queue means agents can't keep up.
- Agent utilization — are all three agents actually processing, or is one idle?
- Task completion time — average time from toprompt
in_progress. Outliers point to API latency or task complexity issues.promptcompleted - Error rate — ratio of failed to completed. Above 5% means something systematic is breaking.
- Retry rate — how often tasks hit retry logic. High retry rates under low error rates usually mean timeout thresholds are too tight.
Frequently asked questions
Why cap at 3 agents instead of running as many as possible? Each subagent runs in its own context window. Beyond 3, the parent agent's context fills with coordination overhead and reasoning quality degrades. Three is empirically the point where throughput gains from parallelism outpace the quality cost from context pressure.
What happens if two agents try to claim the same task simultaneously? Each agent reads the queue, claims a pending task in a single Write operation, then re-reads to confirm the claim persisted. If two agents claim the same task, one sees its claim overwritten and moves to the next pending task. Optimistic concurrency — it doesn't prevent double-claiming but detects and resolves it immediately.
Can I use this pattern with Claude Code cloud routines? Yes, with one constraint: cloud routines have a 1-hour minimum interval. For high-throughput queues, run the parent agent in a loop or as a local scheduled task. Cloud routines work well for the daily kickoff — starting the parent agent that then manages the queue for the session.
