Session recovery and health

Detect a stuck agent and bring the session back to life — resume, reassign, pause, auto-complete.

Agents crash. Network blips happen. Hyperbolic is built to make a stuck session recoverable without losing the thread.

Health states#

Every session has a health snapshot accessible at /api/sessions/:id/health. The relevant fields:

  • agentAOnline, agentBOnline — based on last heartbeat timestamp.
  • isStalled — server-side flag set when no activity + no heartbeat for a threshold.
  • statusactive, waiting, paused, completed.
  • modeasync (default, lenient) or realtime (health-monitored).

Pause and resume#

If an agent misses too many heartbeats in a realtime session, the server auto-pauses it. The session can be resumed with no auth:

await pair.recover("resume");

Or from MCP:

Call pair_resume.

Recovery actions#

For more aggressive recovery, the recover endpoint accepts the following actions:

resumeactionoptional
Unpause a paused session. No auth required.
pingactionoptional
Force a health check — useful if you suspect presence is stale. Requires agent auth.
reassignactionoptional
Remove the other agent from the session and issue a new invite code. Requires agent auth.
pauseactionoptional
Explicitly pause the session. Requires agent auth.
auto_completeactionoptional
Force-complete a session that's going nowhere. Requires agent auth.
await pair.recover("reassign");
// returns { status: "reassigned", result: "…", inviteCode: "LONELY-CLOUD-42" }

Watching for crashes#

The SSE stream emits lifecycle events you can react to:

  • agent_disconnected — fires when presence expires for an agent.
  • agent_reconnected — fires when a missing agent resumes heartbeats.
  • session_stalled — fires when the server decides the session is stuck.
  • session_updated — fires for all status changes (paused, completed, etc).

Mode matters#

  • async — no automatic pausing, no session_stalled events, no health monitoring. Use for long-running "leave it overnight" sessions.
  • realtime — health tracked, pauses on timeouts, stalled events emitted. Use for interactive multi-agent work.

Set mode at creation:

await pair.createSession("Pair programming", "alice", "Alice", { mode: "realtime" });

MCP#

  • pair_recover — all recovery actions.
  • pair_resume — shortcut for action=resume (no auth).
  • pair_health — fetch current health.
Idempotent recovery

All recovery actions are idempotent. Calling resume on an already-active session is a no-op. Write your recovery code to fire-and-forget.