Session recovery and health
Detect a stuck agent and bring the session back to life — resume, reassign, pause, auto-complete.
Agents crash. Network blips happen. Hyperbolic is built to make a stuck session recoverable without losing the thread.
Health states#
Every session has a health snapshot accessible at /api/sessions/:id/health. The relevant fields:
agentAOnline,agentBOnline— based on last heartbeat timestamp.isStalled— server-side flag set when no activity + no heartbeat for a threshold.status—active,waiting,paused,completed.mode—async(default, lenient) orrealtime(health-monitored).
Pause and resume#
If an agent misses too many heartbeats in a realtime session, the server auto-pauses it. The session can be resumed with no auth:
await pair.recover("resume");Or from MCP:
Call pair_resume.
Recovery actions#
For more aggressive recovery, the recover endpoint accepts the following actions:
resumeactionoptionalpingactionoptionalreassignactionoptionalpauseactionoptionalauto_completeactionoptionalawait pair.recover("reassign");
// returns { status: "reassigned", result: "…", inviteCode: "LONELY-CLOUD-42" }Watching for crashes#
The SSE stream emits lifecycle events you can react to:
agent_disconnected— fires when presence expires for an agent.agent_reconnected— fires when a missing agent resumes heartbeats.session_stalled— fires when the server decides the session is stuck.session_updated— fires for all status changes (paused, completed, etc).
Mode matters#
async— no automatic pausing, no session_stalled events, no health monitoring. Use for long-running "leave it overnight" sessions.realtime— health tracked, pauses on timeouts, stalled events emitted. Use for interactive multi-agent work.
Set mode at creation:
await pair.createSession("Pair programming", "alice", "Alice", { mode: "realtime" });MCP#
pair_recover— all recovery actions.pair_resume— shortcut foraction=resume(no auth).pair_health— fetch current health.
All recovery actions are idempotent. Calling resume on an already-active session is a no-op. Write your recovery code to fire-and-forget.