Skip to content
Go to Dashboard

Sessions & Tasks

A session is one runtime container — it owns the browser, the profile (cookies / login state), and the workspace (file system). A task is one instruction that runs inside a session. You can fire follow-up tasks at the same session; they share the state the previous task left behind.

Resource shape

text
project
└── session                   id: sess_…
    ├── browser, profile, workspace
    └── task                  id: task_…
        ├── instructions      "Search Hacker News..."
        ├── status            running | done | …
        └── events (SSE)      task.status_changed, task.message, …

Lifecycle (seven states)

mermaid
stateDiagram-v2
    [*] --> pending
    pending --> running
    running --> awaiting_input : task.input_request
    awaiting_input --> running : POST /intervene
    running --> paused : POST /pause
    paused --> running : POST /resume
    running --> done
    running --> failed
    pending --> canceled
    running --> canceled : POST /cancel
    awaiting_input --> canceled
    paused --> canceled
StateMeaningNext moves
pendingAccepted, queued for an agent slotrunning, canceled
runningAgent is actively workingdone, failed, awaiting_input, paused, canceled
awaiting_inputAgent paused itself; needs you to answerrunning (via intervene)
pausedYou paused it (manual)running (via resume), canceled
doneCompleted successfully; output populatedterminal
failedHit an error; error.code and error.detail populatedterminal
canceledYou canceled (or scheduled max-duration tripped)terminal

A task in any non-terminal state holds session resources. Cap with max_duration_minutes to bound that.

Submitting a task

python
from web_agent import Client
from web_agent.v1.types import CreateSessionRequest, RecordingConfigRequest

session = await client.sessions.create(CreateSessionRequest(
    instructions="Find the top 5 Show HN posts from the last 24 hours.",
    model="claude-sonnet-4.6",
    max_cost_usd="0.50",
    max_duration_minutes=10,
    recording=RecordingConfigRequest(enabled=True),
    keep_alive=True,
))
task = session.tasks[0]
FieldTypeNotes
instructionsstringThe task in plain English. Up to 10 000 chars.
modelstringModel id, e.g. claude-sonnet-4.6, gemini-3-flash.
max_cost_usdstringDecimal-as-string. Hard cap.
max_duration_minutesint1–10 080 (one week).
recordingobject{enabled, quality, capture_during_take_control}; omit for off.
keep_aliveboolWhen the task ends, keep the session warm for follow-up tasks.
allowed_actionsstring[]Whitelist of tool actions the agent may call. Empty = all allowed.
profile_idstringReuse cookies/auth from a saved profile.

The full schema is in the OpenAPI spec.

Follow-up tasks

python
from web_agent.v1.types import CreateTaskRequest

followup = await client.sessions.create_task(
    session.id,
    CreateTaskRequest(
        instructions="Now click into the first post and summarise the discussion.",
    ),
)

The follow-up runs in the same browser, with the same cookies, against the same DOM the previous task left.

Events

Every task emits a Server-Sent Events stream:

http
GET /v1/projects/{pid}/do_anything/sessions/{sid}/tasks/{tid}/events
Authorization: Bearer wa_…

Eleven event types (the envelope is the same; data shape varies):

TypeWhen
task.status_changedState transition
task.messageAgent or user message in the chat thread
task.action.startedAgent invoked a tool
task.action.completedTool returned
task.action.failedTool threw
task.screenshotNew browser frame (url is short-lived)
task.input_requestAgent paused; needs you to answer
task.input_request_resolvedYour intervene was accepted
task.cost_updatePer-step cost delta
task.completedTerminal; output populated
stream.heartbeatEvery ~15 s; harmless

Reconnect cleanly:

http
GET …/events
Last-Event-ID: 142

The server replays events with id > 142 so you don't miss anything.

Input request (human in the loop)

When the agent hits a captcha, a 2FA prompt, or any judgment call, it emits task.input_request:

json
{
  "type": "task.input_request",
  "data": {
    "input_request_id": "ir_01HXX…",
    "prompt": "I see a 'Verify you're human' challenge. Solve it for me?",
    "schema": { "type": "object", "properties": { "solved": { "type": "boolean" } } }
  }
}

You answer via POST /intervene:

python
await client.messages.intervene(
    session.id, task.id,
    input_request_id="ir_01HXX…",
    response={"solved": True},
)

The task transitions back to running. The whole cycle is one round-trip; no polling.

Profiles

A profile is a reusable browser identity: cookies, local storage, auth state. Reference one when creating a session:

python
await client.sessions.create(CreateSessionRequest(
    instructions="Open my LinkedIn inbox and reply to the latest message.",
    profile_id="prof_linkedin_main",
))

The first time, set up the profile manually in the Console (sign in, accept cookies, do whatever). Future sessions reuse it.

Workspaces

A workspace is a persistent file system. The agent can read and write files; you fetch them via signed URL after the task is done. Useful for "scrape this site, write a CSV, hand it back."

Next steps