Pool & Sessions
The pool caps how many agent subprocesses run at once across all sessions, FIFO-queues the rest, kills idle ones, and revives them with --resume when new messages arrive.
A session is what holds the conversation: routing key, agent registry, log files, optional project binding.
Source
Pool: internal/agents/pool/ — pool.go (slot allocation), buffer.go (message buffer), factory.go (build agents). Session: internal/agents/session/ — session.go (Meta, Create/Load/SetProject), agents.go (per-session AgentEntry).
Mental model
┌──────────────────────────── Pool ────────────────────────────┐
│ │
│ active map ┌──────────────────────────────┐ │
│ (max=2): │ slot 1: sess-A / "default" │ │
│ │ slot 2: sess-B / "reviewer" │ │
│ └──────────────────────────────┘ │
│ │
│ queue: [sess-C, sess-A/"backend"] │
│ │
│ buffers: sess-A → ["msg1", "msg2"] ← drained on grant │
│ sess-D → ["pending..."] ← persisted to │
│ meta.PendingInput│
└───────────────────────────────────────────────────────────────┘| Knob | Default | What | Source |
|---|---|---|---|
MaxConcurrent | 2 | Subprocess cap across all sessions. | pool.go:159 |
IdleTimeout | 120s | Time without I/O before subprocess kill. Timer pauses while output streams. | pool.go:162 |
KillAfterIdle | 0 | Extra grace seconds after idle timeout. | pool.go:56 |
| Queue | FIFO | Sessions waiting for a slot. | pool.go:38 |
| Revive | automatic | New message → spawn with --resume <cli_session_id>. | pool.go:264 |
DefaultProjectID | (empty) | Fallback project when session has none. Empty = per-session temp dir. | pool.go:59 |
Session anatomy
A session lives at ~/.<app>/agents/sessions/<id>/ (layout.go:51):
sessions/<id>/
├── meta.json ← session.Meta
├── agents.json ← []AgentEntry (per-session named agents)
├── agent.md ← snapshot of the active preset
├── conversation.jsonl ← user/assistant turns (append-only)
├── commands.jsonl ← legacy per-session gate log (kept for compat)
└── raw.jsonl ← raw stream events (optional)session.Meta (session.go:51-60):
type Meta struct {
ProjectID string // project id; "" = use DefaultProjectID
Origin Origin // "slack" | "ui" | "api"
ChannelID string // Slack channel ID (Slack-only)
ActiveAgent string // current agent in agents.json
Status Status // "idle" | "queued" | "running"
CreatedAt time.Time
LastActive time.Time
PendingInput []string // buffered messages — survives wick restart
}PendingInput is the on-disk twin of the in-memory message buffer (next section). Survives wick restart so a session that was queued at shutdown gets its messages drained on next boot.
Session ID by origin
| Origin | ID format | Set by |
|---|---|---|
slack | Slack thread_ts (e.g. 1715167891.234567) | channels/slack.go |
telegram | tg-<chatID> | telegram.go:242 |
ui | UUID minted by the web UI | internal/tools/agents/handler.go |
api | UUID (future) | — |
Per-session agents
One session can hold many named agents (e.g. backend, reviewer, default); only one is active at a time. Each agent in agents.json (session/agents.go):
type AgentEntry struct {
Name string // unique within session
Provider string // provider type ("claude" / "codex" / "gemini")
CLISessionID string // <-- key to resume; written when CLI emits SessionStart
Status Status
CreatedAt time.Time
LastActive time.Time
MaxTurns int // --max-turns cap; 0 = unlimited (provider default)
}CLISessionID is captured from the CLI's SessionStart event by event.ClaudeParser and persisted by store.Store. Switch agent via the /agent <name> meta-command — the previous agent stays in agents.json but the new one becomes ActiveAgent.
Send flow
When a message arrives (pool.go: Send):
1. Look up active map[sessionKey(sess, agent)] → cache hit?
└─ yes → entry.agent.Send(text) — no spawn, no buffer
└─ no → continue
2. Ensure session exists on disk (channels pass thread_ts; auto-create if missing)
3. Append text to in-memory Buffer + persist to meta.PendingInput
AND append the user turn to conversation.jsonl so a page refresh
while the session is queued/spawning still shows the messages.
4. mu.Lock — check capacity
├─ already mid-spawn? → return (in-flight spawn will drain buffer)
├─ active+spawning < max? → mark spawning, release lock, spawn()
└─ pool full → enqueue (dedup: skip if same sess+agent already queued),
mark session status=queued, fire one-shot PreemptIdleSlot,
return
5. spawn():
- load session.Meta → resolve project cwd (project.ResolvePath, fallback chain)
- look up CLISessionID for resume
- factory.Build(FactoryOptions) → returns Agent + State + Store + OnStarted hook
- drain Buffer into one combined input
- markStatus(running)
- a.Start(ctx) → fires CLI subprocess
- OnStarted(pid, binary, argv, firstUserMessage) → completes spawn-log start event
- if drained text non-empty → a.Send(combined)
(user turns were already persisted in step 3 — no double-write)The "spawning" set (pool.go:35) is what prevents two concurrent Send calls from each seeing "slot free" and both calling spawn at once. In-flight spawns count against the cap.
Message buffer
Persistence model (buffer.go):
| Operation | What |
|---|---|
Append | Append to lines[] + persist lines snapshot to meta.PendingInput. |
Drain | Join all lines with \n, clear in-memory + persist nil to meta.PendingInput. |
NewBuffer | Reads meta.PendingInput into lines[] so a wick restart resumes. |
When the slot is granted, the entire buffer is drained as one combined input (joined by \n) and sent as a single message to the spawned agent. So a queued session that received three messages while waiting gets all three delivered in one turn. See agents-design.md §5.1.1 for the rationale.
Each user message is also written to conversation.jsonl at Send time (not at drain time), so the UI's conversation tab shows the messages even before the subprocess spawns. Without that, refreshing the page while queued would render "No messages yet" — the messages would live only in meta.PendingInput, which the conversation view doesn't read.
Preemption
When the pool is full and a queued session has been waiting, PreemptIdleSlot finds the longest-idle active session (Lifecycle == Idle, oldest LastActive) and stops it so the slot frees up. The victim keeps its CLISessionID on disk and resumes via --resume on its next message.
Preemption fires from two places:
| Trigger | When | Notes |
|---|---|---|
Send (one-shot) | At the moment a session enqueues | Skipped if no active session is currently Idle. |
preemptLoop (1 s ticker) | Background, while len(queue) > 0 | Closes the gap where every active was Working at enqueue time but later went Idle — without the retry, the queue would wait out the full idle TTL. Only runs when PreemptIdle = true. |
Spawn environment (Claude)
Each Claude spawn includes some fixed extra flags:
| What | Flag(s) | Why |
|---|---|---|
| MCP tool pre-approval | --allowedTools mcp__wick__wick_list,... | Headless agents can't answer an interactive permission prompt. All five wick meta-tools (wick_list, wick_search, wick_get, wick_execute, wick_list_providers) are pre-approved automatically so MCP tool calls don't stall. |
| Skills dir | --add-dir ~/.claude/skills | Agents can read skill files bundled outside the workspace. Only added when ~/.claude/skills/ exists on disk. The system-prompt path table carves out ~/.claude/skills/** (and the matching ~/.codex/skills/**, ~/.gemini/skills/**, ~/.agents/skills/**) as read-allowed while the rest of ~/.claude/** remains denied. |
| Max turns | --max-turns N | Only added when max_turns > 0 (set on the workflow agent node). 0 = omit the flag, letting the provider default apply. |
WICK_CLAUDE_STDERR_LOG (env var, unset by default) redirects the spawned Claude process's stderr to a file instead of wick's stderr. Regardless of this setting, the last ~4 KB of stderr is always captured in memory so abnormal exits can surface the real error in logs (exit_code + stderr_tail fields).
Exit flow
When the agent subprocess exits (pool.go: onAgentExit):
1. state.MarkKilled()
2. session.markStatus(idle) ← MUST run before releaseSlot (see below)
3. releaseSlot(key) — delete active[key]
4. OnLifecycle(killed)
5. tryGrantQueue() — pop head, spawn next queued sessionOrder matters
markStatus(idle) runs before releaseSlot (pool.go:378). The reverse order causes a Windows-specific race: a fast Send arriving right after Active==0 could see the slot empty, call spawn, and have its meta.json write collide with the trailing idle write (two os.Rename to the same target).
The body runs under p.wg so Stop() can wait for tail work to finish before tearing down.
Resume flow
The point of CLISessionID is to make the kill-revive cycle invisible to the user.
T+0s User sends message → spawn → CLI emits SessionStart with id "abc-123"
→ store captures id → agents.json entry gets CLISessionID="abc-123"
→ conversation streams normally
T+120s No I/O for IdleTimeout → state.MarkIdle → state.MarkKilled
→ onAgentExit → session.Status=idle, slot released
→ CLI subprocess gone, but conversation log + agents.json intact
T+5min User sends new message
→ spawn → load agents.json → CLISessionID="abc-123"
→ factory.Build with ResumeID="abc-123"
→ CLI spawns with --resume abc-123 → restores its own context
→ conversation continues seamlesslyThe CLI is responsible for replaying its own conversation context from the resume ID — wick doesn't replay conversation.jsonl into the subprocess.
Stale resume self-heal
If a --resume spawn exits with "No conversation found" (the CLI can't find the stored session, e.g. after a full Claude data clear), the pool automatically clears the stale CLISessionID from agents.json. The next spawn starts fresh instead of retrying a dead ID. A log line at INFO level records the clear.
The format of the resume ID is CLI-specific:
| CLI | Where it comes from | How to pass it |
|---|---|---|
| Claude | system.subtype=init event | claude --resume <id> |
| Codex | thread.started event | codex --resume <id> (when phase 6 lands) |
| Gemini | init event | env GEMINI_SESSION_ID (when phase 6 lands) |
Today, only Claude is wired end-to-end. Codex / Gemini parsers are stubs in internal/agents/event/; resume flow ships when those parsers land.
Project cwd resolution
pool.resolveCwd at spawn time:
CWD stability for resumable sessions
Once a session has a CLISessionID (i.e. a real conversation exists), project backfill is skipped. Changing the project binding after that point would move the cwd, which would break --resume (Claude's resume is per-cwd). If you need to change the project for an existing session, reset it first.
sess.Meta.ProjectIDnon-empty →project.ResolvePath(layout, id). Returns custom path or<base>/projects/<id>/files/.- Empty →
cfg.DefaultProjectIDset? → resolve that. - Both empty → per-session temp dir at
sessions/<id>/cwd/. Created on demand.
The pool MkdirAlls managed paths before exec.Cmd.Dir. Custom paths are assumed to still exist; if you deleted yours, spawn surfaces a clean error.
Restart recovery
Pool.New returns an empty pool. Wick boot (server.go):
- Construct pool with config.
- Don't auto-spawn for sessions whose previous status was
running— those subprocesses are already dead. Theiragents.jsonkeeps theCLISessionID, so the next message from any channel revives them via the resume flow. - Channels start, listeners come online, business as usual.
The only thing the pool does NOT recover by itself: a session that was queued at shutdown with messages in PendingInput. The next inbound message to that session will trigger Send, which goes through bufferFor — NewBuffer reads PendingInput into lines[], the new message gets appended, and the combined drain goes to the agent on its first slot.
Reset
Reset (/reset meta-command):
1. Kill subprocess if alive
2. Truncate conversation.jsonl, commands.jsonl, raw.jsonl (keep _meta header)
3. Clear CLISessionID in agents.json (so next send is fresh, no --resume)
4. Re-snapshot agent.md from preset
5. Re-merge CLAUDE.md (project-level + agent.md)Useful for "the agent went down a wrong path; start fresh."
Delete
Session delete (session.Delete):
1. Kill subprocess if alive
2. rm -rf sessions/<id>/
3. Project files left alone (projects are shared)Telemetry hooks
Pool fires two callbacks the UI subscribes to:
| Hook | Fires when | Use |
|---|---|---|
OnSessionCreated(sess) | Pool auto-creates a session for an inbound channel message | Register session into manager.Manager so the dashboard sees it without reload. |
OnLifecycle(LifecycleEvent) | spawning (post-Start) and killed transitions | UI badges, spawn-log enrichment. |
Idle / Working transitions are NOT routed via OnLifecycle — they're implicit from the event flow. UIs that want every transition subscribe to AgentEvent via the factory's OnEvent.
See also
- Channels — where
SendFuncis called from. - Projects —
cwdresolution. - Providers —
FactoryOptions.ProviderType/ProviderNameforwarding. - Command Gate — gate's PreToolUse hook fires inside the spawned subprocess; pool doesn't see it.