# WebAgent — Full Documentation (EN) --- # Web Agent Web Agent is the web action layer in SAK. It lets agents search, extract, operate webpages, track changes, and run controlled web tasks. After reading these docs, you can integrate Web Agent into your application, create a session, submit a task, and stream execution results. Web Agent is not a traditional crawler SDK. It is designed for LLM agent task execution: developers provide an instruction, while Web Agent manages runtime resources, page state, retries, structured results, and the task lifecycle. The Console and SDKs are clients for the same API. ## When to use Web Agent - Your agent needs real-time web data instead of relying only on model training data or a fixed knowledge base. - You need search, extraction, browser actions, and long-running tasks behind auditable APIs. - You want the Console, SDKs, and backend services to share the same REST API contract. - You need a session / task model for state, event streaming, or steps that require user confirmation. ## When not to use Web Agent - The task only needs your own backend APIs and does not need open web access. - You need large-scale offline crawling, warehouse synchronization, or search-index construction. - The target site's terms do not allow automated access and you do not have the required authorization. - You have not defined API keys, project scope, task budget, and failure handling. ## Core capabilities | Capability | Description | | --- | --- | | DoAnything API | Provide a natural-language instruction. Web Agent chooses tools, runs steps, and returns a result. | | Shaped APIs | Dedicated API contracts for artifact-shaped workflows such as DeepResearch, WebSearch, and Track. | | Session / task model | A session owns runtime resources. A task represents one instruction or follow-up action. | | Event stream | Subscribe to task state, output chunks, errors, and user-confirmation requests through SSE. | | SDK and raw HTTP | Python, TypeScript, and cURL docs use the same API semantics. | ## Documentation entry points - [What is WebAgent](/en/web-agent/getting-started/what-is-webagent) explains Web Agent's role, boundaries, and API shape. - [Quickstart](/en/web-agent/getting-started/quickstart) runs the first task with Python, TypeScript, or cURL. - [Authentication & API keys](/en/web-agent/getting-started/authentication) explains `wa_` keys, project scope, and rotation. - [Sessions & Tasks](/en/web-agent/concepts/sessions-and-tasks) explains the session, task, event, and profile lifecycle. - [Errors & Retries](/en/web-agent/concepts/errors-and-retries) covers error codes, retry policy, and idempotency. - [API Reference](/en/web-agent/reference/) covers base URL, auth, errors, rate limits, and pagination. - [Vibecoding](/en/web-agent/guides/vibecoding) shows how to give the docs and OpenAPI spec to an IDE-resident LLM. --- # What is WebAgent WebAgent is an API that lets LLM agents act on the web like developers do. You give it an instruction in English, it picks the tools (a browser, a sandbox, a search engine), runs the steps, and returns a result. ## What you get WebAgent ships user-facing API products in two categories: - **DoAnything API** — open-ended; free-form input, the agent picks the path. Session/task resource model + 7-state lifecycle + long-running tasks. - **Shaped APIs** — when you know the artifact shape you want, use the dedicated API for a shaped contract + quality guarantees: - **DeepResearch** — research → report (final.md + citations + confidence). - **WebSearch** — query → results (structured search results + optional summary). - **Track** — monitor → snapshot stream + change notifications. Shared capabilities: - **Profiles** — reusable login state across sessions. No re-logging in every run. - **Workspaces** — a persistent file system the agent can read and write to. - **Schedules** — cron, interval, event-triggered, or autonomous (the agent decides when next to run). - **SSE event stream** — the same `task.*` events that drive the Console, streamed directly to your code. ## What it is not - Not a low-code automation builder. There is no canvas. You wire up tasks in code (or via the Console as a prototyping aid). - Not a hosted LLM API. Bring your task; WebAgent picks an LLM and pays the bill on a credits model. ## Three product surfaces: Console / OpenAPI / SDK All APIs are exposed through the same three surfaces, with 1:1 capability parity and a shared resource layer / event stream / billing: | You can use … | … to do | |---|---| | The [REST API (OpenAPI)](/en/web-agent/reference/) | Anything. Console and SDKs are just clients. `api.web-agent.asix.inc/v1/...` + `Authorization: Bearer wa_...` | | Python or TypeScript [SDK](/en/web-agent/sdk/python) | Same surface, idiomatic types, retries, streaming, `wait_for_done`. | | The [Console](https://console.web-agent.asix.inc) | Prototype tasks visually; non-developers welcome; *Get Code* dialog hands you working snippets. | **API-developer-first** — the product is the API. Console is a convenience layer, not a separate product surface; no Console-only privileged endpoints. ## Mental model ``` Session (one container; holds a browser, profile, workspace) └── Task #1 status: completed (one instruction; lifecycle has 7 states) └── Task #2 status: running (a follow-up instruction in the same session) └── events: SSE stream (status_changed, message, action.*, screenshot, …) ``` A **session** owns the runtime resources (browser, profile, workspace). Each **task** is one instruction; you can submit follow-up tasks against the same session and they share state. The task lifecycle has seven states (`pending`, `running`, `awaiting_input`, `paused`, `done`, `failed`, `canceled`); see [Sessions & Tasks](/en/web-agent/concepts/sessions-and-tasks). > The Standalone API (DeepResearch) uses the same task lifecycle but doesn't expose sessions — it's a one-shot artifact that doesn't need cross-task browser reuse. ## Next steps - [Quickstart](/en/web-agent/getting-started/quickstart) — 5 minutes from sign-up to first SSE event. - [Authentication & API keys](/en/web-agent/getting-started/authentication) — how `wa_` keys work and how to scope them. - [Sessions & Tasks](/en/web-agent/concepts/sessions-and-tasks) — the core resource model. --- # Quickstart Five minutes. Sign up, install the SDK, run a task, and watch events stream back. ## Step 0 — Get an API key (30 sec) 1. Sign up at [console.web-agent.asix.inc](https://console.web-agent.asix.inc). 2. Open **Settings → API Keys → Create**. 3. Copy the `wa_…` key. **It is shown once.** If you lose it, revoke and create again. ```bash export WEBAGENT_API_KEY=wa_xxxxxxxxxxxxxxxxxxxxxxxx export WEBAGENT_PROJECT_ID=proj_xxxxxxxxxxxxxxxxxxxxxxxx ``` ::: tip Project ID Project-scoped paths look like `/v1/projects/{pid}/…` (DoAnything / WebSearch / Track). Find your project ID in the Console URL after **Project Switcher → your project**. Standalone endpoints (DeepResearch) resolve the project from your Bearer token. ## Step 1 — Install the SDK (30 sec) ```bash pip install web-agent-sdk ``` ```bash npm install @web-agent/sdk ``` ```bash # nothing to install ``` ## Step 2 — Run a task (90 sec) `Client` opens a DoAnything session and streams events to terminal. ```python import asyncio from web_agent.v1 import Client from web_agent.v1.types import CreateSessionRequest async def main(): async with Client( api_key="wa_demo_xxxxxxxxxxxxxxxx", project_id="proj_demo_0001", ) as client: session = await client.sessions.create(CreateSessionRequest( instructions="Search Hacker News for the top 5 stories today, return them as a list.", )) task = session.tasks[0] async for event in client.events.stream(session.id, task.id): print(event.type, event.data) if event.type == "task.completed": break asyncio.run(main()) ``` ```typescript import { Client } from "@web-agent/sdk"; const client = new Client({ apiKey: "wa_demo_xxxxxxxxxxxxxxxx", projectId: "proj_demo_0001", }); const session = await client.sessions.create({ instructions: "Search Hacker News for the top 5 stories today, return them as a list.", }); const task = session.tasks[0]!; for await (const event of client.events.stream(session.id, task.id)) { console.log(event.type, event.data); if (event.type === "task.completed") break; } ``` ```bash curl https://api.web-agent.asix.inc/v1/projects/proj_demo_0001/do_anything/sessions \ -H "Authorization: Bearer wa_demo_xxxxxxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "instructions": "Search Hacker News for the top 5 stories today, return them as a list." }' ``` ::: tip Same code as the Console The Console's **Get Code** dialog hands you exactly this snippet, with your real API key and current form values pre-filled. Open `console.web-agent.asix.inc/new`, fill the form, click **Get Code**, and you skip the typing. ## Step 3 — Switch to a shaped API when you know the artifact (60 sec) DoAnything is open-ended (artifact shape unconstrained). When you **know** you want a research report / a solved problem / search results, use the shaped API for a quality contract: ```python async with Client(api_key="wa_...", project_id="proj_demo") as client: task = await client.deep_research.run( topic="Open-source vector DB landscape 2026", depth="deep", ) print(task["task_id"]) ``` ```python async with Client(api_key="wa_...", project_id="proj_demo") as client: # wait=true (default): blocks ≤30s synchronously result = await client.web_search.run( queries=["best Python ORM 2026"], ) for hit in result["results"]["results"]: print(hit["title"], hit["url"]) ``` ```python async with Client(api_key="wa_...", project_id="proj_demo") as client: mon = await client.track.create( intent="Notify me when the Apple stock dips below $200", schedule={"kind": "interval", "every_seconds": 3600}, notify_channel={"kind": "callback_url", "url": "https://hooks.example.com/track"}, ) print(mon["id"]) ``` The shaped APIs share the same auth / error envelope / event channel as DoAnything — see [Python SDK](/en/web-agent/sdk/python) / [TypeScript SDK](/en/web-agent/sdk/typescript). ## Step 4 — Watch it in the Console (30 sec) Open `https://console.web-agent.asix.inc/sessions/` (replace with the id you printed in step 2). You'll see the same chat log plus a live browser preview iframe — exactly what your stream is showing, rendered. ## Next steps - [Run a task that asks you to confirm something](/en/web-agent/concepts/sessions-and-tasks#input-request) — `task.input_request`. - [Save login state across sessions](/en/web-agent/concepts/sessions-and-tasks#profiles) — Profiles. - [Schedule a task to run every morning](/en/web-agent/reference/) — Schedules. - [Browse the full API](/en/web-agent/reference/) — every endpoint, every field. ## Troubleshooting | Symptom | Cause | Fix | |---|---|---| | `401 unauthorized` | Wrong, expired, or revoked key | **Settings → API Keys** → create a new one | | `402 insufficient_credits` | Free quota used up | **Settings → Billing → Add credits** | | SSE stalls > 60 s | Network drop or proxy buffering | Reconnect with `Last-Event-ID` — see [Events & SSE](/en/web-agent/concepts/sessions-and-tasks#events) | | `429 rate_limit_exceeded` | Burst over plan concurrency | Back off + retry; Dev plan defaults to 5–10 concurrent sessions | --- # Authentication & API keys This page covers WebAgent's API key format, how to send it on every request, and the create / revoke / rotate flow. Every request carries a bearer token in the `Authorization` header: ```http Authorization: Bearer wa_xxxxxxxxxxxxxxxxxxxxxxxx ``` ## Key shape - **Prefix:** `wa_` — short for *web agent*. (Mirrors Stripe's `sk_*`.) - **Length:** 28+ chars after the prefix. Treat them as opaque. - **Scope:** one project. Multi-project? Create one key per project. - **Visibility:** shown **once** at creation. Lose it → revoke and create again. - **Revocation:** soft-delete with a one-hour grace window so in-flight requests don't 401 mid-task. ## Where to keep it - **Local dev:** environment variable, `.env.local` (already in `.gitignore`). - **Production:** your secret manager (Vault, AWS Secrets Manager, GCP Secret Manager, …). - **Never** commit a key to git. The Console's *Get Code* dialog uses a placeholder by default. ## Multi-project A single user can have many projects. The API path encodes the project: ```http GET /v1/projects/{project_id}/do_anything/sessions ``` There is no `X-Project-Id` header. The path makes the tenant explicit so a stray `curl` to a different project ID is a different URL — no silent cross-tenant calls. ## Rotation You can keep two valid keys at once. Common rotation flow: 1. **Create** a new key in **Settings → API Keys**. 2. **Deploy** with the new key. 3. **Revoke** the old one. Old key keeps working for one hour; deploy completes; old key 401s after the grace window. ## Errors | Status | Code | Meaning | |---|---|---| | 401 | `unauthorized` | Missing, malformed, expired, or revoked-past-grace | | 403 | `forbidden` | Key valid but doesn't have access to that project | | 429 | `rate_limit_exceeded` | Per-key concurrency or per-minute limit hit | ## Next steps - [Pricing & Credits](/en/web-agent/getting-started/pricing) — what each task costs. - [Sessions & Tasks](/en/web-agent/concepts/sessions-and-tasks) — what a request actually creates. --- # Pricing & Credits WebAgent runs on **credits** — a pre-paid USD balance. Each task drains the balance as it runs. The Console header shows a live `$X.XX` and hovering reveals the breakdown. ## Two buckets | Bucket | Source | Expires | |---|---|---| | **Monthly** | Your subscription's included credits | End of billing period | | **Additional** | One-off top-ups + auto-recharge | Never | Tasks drain monthly first, then additional. You'll never lose top-up money to a month-end reset. ## Four cost lines per task Every task records four costs separately so you can build dashboards: - `llm_cost_usd` — LLM tokens consumed - `browser_cost_usd` — browser-pool seconds - `proxy_cost_usd` — proxy bandwidth (when applicable) - `total_cost_usd` — the sum, also what drains your balance You can read them on `SessionResponse` and on every `task.cost_update` SSE event. ## Plans | Plan | Monthly | Includes | Concurrency | |---|---|---|---| | Free | $0 | $50–100 trial credits during early access | 1–2 | | Dev | ~$29 | $30 credits | 5–10 | | Business | ~$299 | $400 credits + team seats | 50–100 | | Scaleup | ~$999 | $1,400 credits + dedicated queue + region pinning | 250+ | ::: tip Subject to ±30% adjustment Numbers above are baseline. Pricing is finalised before public launch; early-access users get the locked-in rate. ## Auto-recharge To avoid 402s on a long Sunday-night run, enable auto-recharge in **Settings → Billing**: - **Threshold** — top up when balance falls below `$X`. - **Amount** — top up by `$Y` each time. - **Monthly cap** — never spend more than `$Z` of recharge per calendar month. ## Hard limits per task You can cap individual tasks. Useful for cron jobs you'd rather have fail-fast than runaway: ```python await client.sessions.create(CreateSessionRequest( instructions="...", max_cost_usd="2.00", max_duration_minutes=30, )) ``` If a task exceeds either limit it transitions to `failed` with `code: budget_exceeded`. The credits already spent are still billed. ## Next steps - [Authentication](/en/web-agent/getting-started/authentication) — how to keep keys safe. - [Sessions & Tasks](/en/web-agent/concepts/sessions-and-tasks) — what a task actually runs. --- # Sessions & Tasks A **session** is one runtime container — it owns the browser, the profile (cookies / login state), and the workspace (file system). A **task** is one instruction that runs inside a session. You can fire follow-up tasks at the same session; they share the state the previous task left behind. ## Resource shape ```text project └── session id: sess_… ├── browser, profile, workspace └── task id: task_… ├── instructions "Search Hacker News..." ├── status running | done | … └── events (SSE) task.status_changed, task.message, … ``` ## Lifecycle (seven states) ```mermaid stateDiagram-v2 [*] --> pending pending --> running running --> awaiting_input : task.input_request awaiting_input --> running : POST /intervene running --> paused : POST /pause paused --> running : POST /resume running --> done running --> failed pending --> canceled running --> canceled : POST /cancel awaiting_input --> canceled paused --> canceled ``` | State | Meaning | Next moves | |---|---|---| | `pending` | Accepted, queued for an agent slot | → `running`, `canceled` | | `running` | Agent is actively working | → `done`, `failed`, `awaiting_input`, `paused`, `canceled` | | `awaiting_input` | Agent paused itself; needs you to answer | → `running` (via `intervene`) | | `paused` | You paused it (manual) | → `running` (via `resume`), `canceled` | | `done` | Completed successfully; `output` populated | terminal | | `failed` | Hit an error; `error.code` and `error.detail` populated | terminal | | `canceled` | You canceled (or scheduled max-duration tripped) | terminal | A task in any non-terminal state holds session resources. Cap with `max_duration_minutes` to bound that. ## Submitting a task ```python from web_agent import Client from web_agent.v1.types import CreateSessionRequest, RecordingConfigRequest session = await client.sessions.create(CreateSessionRequest( instructions="Find the top 5 Show HN posts from the last 24 hours.", model="claude-sonnet-4.6", max_cost_usd="0.50", max_duration_minutes=10, recording=RecordingConfigRequest(enabled=True), keep_alive=True, )) task = session.tasks[0] ``` | Field | Type | Notes | |---|---|---| | `instructions` | string | The task in plain English. Up to 10 000 chars. | | `model` | string | Model id, e.g. `claude-sonnet-4.6`, `gemini-3-flash`. | | `max_cost_usd` | string | Decimal-as-string. Hard cap. | | `max_duration_minutes` | int | 1–10 080 (one week). | | `recording` | object | `{enabled, quality, capture_during_take_control}`; omit for off. | | `keep_alive` | bool | When the task ends, keep the session warm for follow-up tasks. | | `allowed_actions` | string[] | Whitelist of tool actions the agent may call. Empty = all allowed. | | `profile_id` | string | Reuse cookies/auth from a saved profile. | The full schema is in the [OpenAPI spec](/openapi/v1.json). ## Follow-up tasks ```python from web_agent.v1.types import CreateTaskRequest followup = await client.sessions.create_task( session.id, CreateTaskRequest( instructions="Now click into the first post and summarise the discussion.", ), ) ``` The follow-up runs in the same browser, with the same cookies, against the same DOM the previous task left. ## Events {#events} Every task emits a Server-Sent Events stream: ```http GET /v1/projects/{pid}/do_anything/sessions/{sid}/tasks/{tid}/events Authorization: Bearer wa_… ``` Eleven event types (the envelope is the same; `data` shape varies): | Type | When | |---|---| | `task.status_changed` | State transition | | `task.message` | Agent or user message in the chat thread | | `task.action.started` | Agent invoked a tool | | `task.action.completed` | Tool returned | | `task.action.failed` | Tool threw | | `task.screenshot` | New browser frame (`url` is short-lived) | | `task.input_request` | Agent paused; needs you to answer | | `task.input_request_resolved` | Your `intervene` was accepted | | `task.cost_update` | Per-step cost delta | | `task.completed` | Terminal; `output` populated | | `stream.heartbeat` | Every ~15 s; harmless | Reconnect cleanly: ```http GET …/events Last-Event-ID: 142 ``` The server replays events with `id > 142` so you don't miss anything. ## Input request (human in the loop) {#input-request} When the agent hits a captcha, a 2FA prompt, or any judgment call, it emits `task.input_request`: ```json { "type": "task.input_request", "data": { "input_request_id": "ir_01HXX…", "prompt": "I see a 'Verify you're human' challenge. Solve it for me?", "schema": { "type": "object", "properties": { "solved": { "type": "boolean" } } } } } ``` You answer via `POST /intervene`: ```python await client.messages.intervene( session.id, task.id, input_request_id="ir_01HXX…", response={"solved": True}, ) ``` The task transitions back to `running`. The whole cycle is one round-trip; no polling. ## Profiles {#profiles} A **profile** is a reusable browser identity: cookies, local storage, auth state. Reference one when creating a session: ```python await client.sessions.create(CreateSessionRequest( instructions="Open my LinkedIn inbox and reply to the latest message.", profile_id="prof_linkedin_main", )) ``` The first time, set up the profile manually in the Console (sign in, accept cookies, do whatever). Future sessions reuse it. ## Workspaces A **workspace** is a persistent file system. The agent can read and write files; you fetch them via signed URL after the task is done. Useful for "scrape this site, write a CSV, hand it back." ## Next steps - [API Reference](/en/web-agent/reference/) — every field. - [Authentication](/en/web-agent/getting-started/authentication) — keys, scopes, rotation. - [Vibecoding](/en/web-agent/guides/vibecoding) — how to feed all of this to your IDE. --- # Errors & retries This page lists WebAgent's error codes and tells you which ones to retry, which to surface to the user, and which to fix in your own code. Every error response has the same shape: ```json { "code": "rate_limit_exceeded", "detail": "Per-key concurrency limit (10) reached.", "extra": { "limit": 10, "active": 10 } } ``` The HTTP status tells you the *category*; the `code` field is the **stable contract** — switch on it, never on `detail` (English prose, may change). ## Code matrix | Status | Code | Retry? | What to do | |---|---|---|---| | 400 | `bad_request` | ❌ | Fix the request body. Check the OpenAPI spec for the field. | | 401 | `unauthorized` | ❌ | Key is missing, malformed, expired, or revoked past its 1-hour grace. Create a new one. | | 402 | `insufficient_credits` | ❌ | Top up via **Settings → Billing**, or enable auto-recharge. | | 402 | `budget_exceeded` | ❌ | Per-task `max_cost_usd` cap hit. Raise the cap or split the work. | | 403 | `forbidden` | ❌ | Key valid, but the project doesn't grant it access to this resource. | | 403 | `safety_boundary_violated` | ❌ | The agent refused on safety grounds. Read `extra.reason`; reword the instruction. | | 404 | `session_not_found`, `task_not_found`, `profile_not_found`, … | ❌ | The id is wrong or the resource was deleted. | | 409 | `conflict` | ❌ | State mismatch (e.g. `cancel` on a terminal task). Re-read state and decide. | | 422 | `validation_error` | ❌ | Schema-level — `extra.errors[]` lists the offending fields. | | 429 | `rate_limit_exceeded` | ✅ | Honour `Retry-After`; exponential back-off if absent. | | 429 | `too_many_concurrent_sessions` | ✅ | Wait for an in-flight session to free up, or upgrade plan. | | 5xx | `internal_error` | ✅ | Same call, exponential back-off. Capped at 3–5 attempts. | | (network) | — | ✅ | Connection reset / timeout — retry idempotently. | ✅ = safe to retry without reasoning. ❌ = will keep failing until you change something. ## Retry policy we recommend ```python import time, random def with_retries(fn, *, attempts=4, base=0.5, cap=8.0): for i in range(attempts): try: return fn() except WebAgentError as e: if e.code not in {"rate_limit_exceeded", "too_many_concurrent_sessions", "internal_error"}: raise # not safe to retry if i == attempts - 1: raise sleep_s = min(cap, base * 2**i) + random.uniform(0, 0.25) time.sleep(e.retry_after_seconds or sleep_s) ``` The Python and TypeScript SDKs ship this loop by default; the table above is for when you're calling the API directly. ## Idempotency keys {#idempotency} Every mutating endpoint (`POST /sessions`, `POST /sessions/{sid}/tasks`, `POST /messages`, …) accepts an `Idempotency-Key` header: ```http POST /v1/projects/proj_demo_0001/do_anything/sessions Idempotency-Key: 9b2f7c1e-…-uuid ``` Replay the same UUID within 24 hours and you'll get the **same response back** (same `session_id`, same status code) — even after a network blip. Generate one UUID per logical action, not per retry. ## Inside the task lifecycle Errors during agent execution don't always fail the task — many are recoverable: - **Tool error** — emitted as `task.action.failed`; the agent decides to retry the tool, choose a different one, or fail the whole task. - **Captcha / 2FA** — emitted as `task.input_request`; you answer via `POST /intervene`. - **Hard cap hit** — task transitions to `failed` with `error.code = budget_exceeded` or `duration_exceeded`. Spent credits are billed. - **Safety refusal** — task transitions to `failed` with `error.code = safety_boundary_violated`. No credits charged for the refused step. You see all four through the [SSE stream](/en/web-agent/concepts/sessions-and-tasks#events). ## SSE-specific failure modes | Symptom | Cause | Fix | |---|---|---| | Stream stalls > 60 s | Network drop or proxy buffering | Reconnect with `Last-Event-ID: `. | | `Last-Event-ID` ignored | Buffer expired (older than 1 hour) | Re-fetch task state via `GET /sessions/{sid}/tasks/{tid}` and resume from current. | | Duplicate events on reconnect | At-least-once delivery | Dedupe by event `id` (monotonic per-task). | ## Next steps - [API Overview](/en/web-agent/reference/) — every endpoint shares these conventions. - [Sessions & Tasks](/en/web-agent/concepts/sessions-and-tasks) — the lifecycle states above are normative. --- # Python SDK This page covers how to install, configure, and use the official `web-agent-sdk` Python package: open a session, run a task, and stream events. ```bash pip install web-agent-sdk ``` Requires Python 3.10+. The SDK is async-first (`asyncio` / `anyio`). ## One entry point: `Client` User-facing API products live on the same `Client`: ```python from web_agent.v1 import Client ``` | Resource | Product | Use case | |---|---|---| | `client.sessions / messages / events` | DoAnything (open-ended) | Free-form input; the agent picks the path. | | `client.deep_research` | DeepResearch (research → report) | Standalone API. | | `client.web_search` | WebSearch (query → results) | Synchronous by default (`wait=true`). | | `client.track` | Track (monitor → snapshot) | Long-lived monitors with webhook delivery. | > The package name is `web-agent-sdk` (hyphen) but the import is `web_agent` — same convention as `python-dateutil` → `dateutil`. ## DoAnything — open-ended tasks ```python import asyncio from web_agent.v1 import Client from web_agent.v1.types import CreateSessionRequest async def main(): async with Client( api_key="wa_demo_xxxxxxxxxxxxxxxx", project_id="proj_demo_0001", ) as client: session = await client.sessions.create(CreateSessionRequest( instructions="Search Hacker News for the top 5 stories today, return them as a list.", )) task = session.tasks[0] # session-create implicitly queues the first task async for event in client.events.stream(session.id, task.id): print(event.type, event.data) if event.type == "task.completed": break asyncio.run(main()) ``` `api_key` and `project_id` default to `$WEBAGENT_API_KEY` / `$WEBAGENT_PROJECT_ID` if you omit them. ### Follow-up task vs. inflight message ```python # 1. Push a message into the *current* task's chat queue # (agent peeks the queue at the next ReAct boundary) await client.messages.send( session.id, task.id, content="Also include the comment count for each.", ) # 2. Start a NEW task in the SAME session # (reuses browser, profile, workspace; previous task must be terminal) from web_agent.v1.types import CreateTaskRequest new_task = await client.sessions.create_task( session.id, CreateTaskRequest(instructions="Click into the first post and summarise it."), ) ``` ### Answer an input request ```python await client.messages.intervene( session.id, task.id, input_request_id="ir_01HXX", response={"solved": True}, ) ``` ### Cancel / stop / list ```python await client.sessions.cancel_task(session.id, task.id, reason="user_cancelled") await client.sessions.stop(session.id, force=False) # soft stop session listing = await client.sessions.list(status="running", limit=20) for s in listing.items: print(s.id, s.status) ``` ### Heartbeats and resume `stream()` filters heartbeats by default; pass `include_heartbeats=True` for connection-health UIs. Resume an interrupted stream with `Last-Event-ID`: ```python client.events.stream(session.id, task.id, last_event_id="142") ``` ## DeepResearch — research → report DR is a Standalone API (pidless: `/v1/deep_research`); the project tenant resolves from the Bearer token. ```python async with Client(api_key="wa_...", project_id="proj_demo") as client: task = await client.deep_research.run( topic="Open-source vector DB landscape 2026", depth="deep", # light / standard / deep require_outline_approval=True, # outline HITL gate (default on) ) print(task["task_id"], task["status"]) ``` Subscribe to events (DR uses the DoAnything SSE channel) and respond to the outline gate: ```python async for event in client.events.stream( task["session_id"], task["task_id"], ): if event.type == "task.input_request": # outline ready, awaiting human approval await client.deep_research.intervene( task["task_id"], request_id=event.data["request_id"], response="approve", # or {"action": "approve_with_edits", "edits": [...]} ) if event.type == "task.completed": break # Pull the three-piece artifact set (final.md / citations.json / confidence.json) artifacts = await client.deep_research.list_artifacts(task["task_id"]) final = await client.deep_research.get_artifact( task["task_id"], artifacts[0]["id"], ) ``` ## WebSearch — query → results WS is a project-scoped API. `run()` defaults to `wait=true`: the server blocks for ≤30s and returns the done envelope; on timeout it returns 202 — call `get(task_id)` to poll. ```python # Synchronous (default) result = await client.web_search.run( queries=["best Python ORM 2026"], engines=["tavily"], summarize=True, ) for hit in result["results"]["results"]: print(hit["title"], hit["url"]) # Async pending = await client.web_search.run_async(queries=["best Python ORM 2026"]) detail = await client.web_search.get(pending["task_id"]) # Refine (re-run within the same task) await client.web_search.refine( pending["task_id"], text="add site:reddit.com and re-run", ) ``` ## Track — long-lived monitors Track is a project-scoped API. A **monitor** is a long-lived background job: a cron / interval / event schedule + an extraction goal + a notify channel (webhook). Each tick produces a `snapshot` row; whenever the trigger DSL judges the diff worth notifying, the configured channel fires. ```python mon = await client.track.create( intent="Notify me when the iPhone 17 Pro listing on apple.com goes below $999", schedule={"kind": "interval", "every_seconds": 3600}, notify_channel={"kind": "callback_url", "url": "https://hooks.example.com/track"}, ) # Lifecycle controls — pause / resume / refine via patch: await client.track.pause(mon["id"], reason="manual review") await client.track.resume(mon["id"]) await client.track.refine(mon["id"], trigger_dsl={"op": "lt", "field": "price", "value": 999}) # Manually fire a tick (bypasses schedule); inspect the per-tick payload: outcome = await client.track.run_now(mon["id"]) # Pull the snapshot history (newest first): snapshots = await client.track.list_snapshots(mon["id"]) snap = await client.track.get_snapshot(mon["id"], snapshots["items"][0]["id"]) # Inspect webhook outbox + retry a dead row: deliveries = await client.track.list_deliveries(mon["id"], include_payload=True) await client.track.retry_delivery(mon["id"], deliveries["items"][0]["id"]) # Cancel terminates the monitor (terminal state): await client.track.cancel(mon["id"]) # equivalent: await client.track.delete(mon["id"]) ``` ### Alignment HITL (optional) If the supervisor needs you to disambiguate intent (e.g. "did you mean SKU A or SKU B?"), the monitor moves to `pending_clarification` and emits an `alignment.input_request` event. Answer with `intervene()`: ```python await client.track.intervene( mon["id"], request_id="req_align_1", response="SKU A", ) ``` You can also push free-text guidance into the alignment queue at any time via `client.track.message(mon_id, content="…")`. ## Errors The SDK raises typed exceptions you can catch by class: ```python from web_agent.v1 import ( UnauthorizedError, InsufficientCreditsError, RateLimitedError, ) try: await client.sessions.create(CreateSessionRequest(instructions="…")) except InsufficientCreditsError as e: print("top up:", e.detail, e.extra) ``` Every exception subclasses `ApiError` and carries `code` / `detail` / `extra` matching the [API error envelope](/en/web-agent/reference/#errors). | Exception class | HTTP | `code` | |---|---|---| | `UnauthorizedError` | 401 | `unauthorized` | | `ForbiddenError` | 403 | `forbidden`, `safety_boundary_violated` | | `NotFoundError` | 404 | `*_not_found` | | `ConflictError` | 409 | `conflict` | | `ValidationError` | 422 | `validation_error` | | `RateLimitedError` | 429 | `rate_limit_exceeded` | | `InsufficientCreditsError` | 402 | `insufficient_credits` | | `BudgetExceededError` | 402 | `budget_exceeded` | ## Type stubs DoAnything resources (`Session`, `Task`, `Event`, etc.) are dataclasses re-exported from `web_agent.v1`: ```python from web_agent.v1 import Session, Task, Event, TaskStatus ``` DR / DS / WS responses are returned as `dict[str, Any]` (the OpenAPI envelope verbatim) — index by key (`task["task_id"]` / `task["status"]`). `mypy --strict` is supported. ## Next steps - [TypeScript SDK](/en/web-agent/sdk/typescript) — same surface in JS/TS. - [Errors & retries](/en/web-agent/concepts/errors-and-retries) — recommended retry policy, idempotency keys. - [Sessions & Tasks](/en/web-agent/concepts/sessions-and-tasks) — lifecycle, profiles, workspaces. --- # TypeScript SDK This page covers how to install, configure, and use the official `@web-agent/sdk` Node / browser package: open a session, run a task, and stream events. ```bash npm install @web-agent/sdk # or pnpm add @web-agent/sdk / yarn add @web-agent/sdk / bun add @web-agent/sdk ``` Works in Node 20+ and modern browsers. > **Don't ship a server-grade `wa_` key to the browser.** Keys grant project-wide access; ship them only to server-side code or to environments where you trust the runtime. ## One entry point: `Client` User-facing API products live on the same `Client`: ```typescript import { Client } from "@web-agent/sdk"; ``` | Resource | Product | Use case | |---|---|---| | `client.sessions / messages / events` | DoAnything (open-ended) | Free-form input; the agent picks the path. | | `client.deepResearch` | DeepResearch (research → report) | Standalone API. | | `client.webSearch` | WebSearch (query → results) | Synchronous by default (`wait: true`). | | `client.track` | Track (monitor → snapshot) | Long-lived monitors with webhook delivery. | ## DoAnything — open-ended tasks ```typescript import { Client } from "@web-agent/sdk"; const client = new Client({ apiKey: process.env.WEBAGENT_API_KEY!, projectId: process.env.WEBAGENT_PROJECT_ID!, }); const session = await client.sessions.create({ instructions: "Search Hacker News for the top 5 stories today.", }); const task = session.tasks[0]!; // session-create implicitly queues the first task for await (const event of client.events.stream(session.id, task.id)) { console.log(event.type, event.data); if (event.type === "task.completed") break; } ``` Wire fields stay snake_case to match the API exactly; method names are camelCase. ### Resume + heartbeats ```typescript for await (const event of client.events.stream(session.id, task.id, { lastEventId: "142", includeHeartbeats: false, })) { if (event.type === "task.completed") break; } ``` Backed by `fetch` with manual SSE parsing — works in Node 20+, Bun, Cloudflare Workers, and modern browsers. ### Follow-up task vs. inflight message ```typescript // 1. Push into the current task's chat queue await client.messages.send(session.id, task.id, { content: "Also include the comment count for each.", }); // 2. Start a NEW task in the SAME session const followup = await client.sessions.createTask(session.id, { instructions: "Click into the first post and summarise it.", }); ``` ### Answer an input request ```typescript await client.messages.intervene(session.id, task.id, { kind: "answer_input_request", input_request_id: "ir_01HXX", response: { solved: true }, }); ``` The `kind` discriminator lets the same endpoint handle take-control / release-control too — see [Take Control](/en/web-agent/concepts/sessions-and-tasks#input-request). ### Cancel / stop / list ```typescript await client.sessions.cancelTask(session.id, task.id, { reason: "user_cancelled" }); await client.sessions.stop(session.id, { force: false }); const list = await client.sessions.list({ status: "running", limit: 20 }); list.items.forEach((s) => console.log(s.id, s.status)); ``` ## DeepResearch — research → report DR is a Standalone API (pidless: `/v1/deep_research`); the project tenant resolves from the Bearer token. ```typescript const task = await client.deepResearch.run({ topic: "Open-source vector DB landscape 2026", depth: "deep", // light / standard / deep requireOutlineApproval: true, // outline HITL gate (default on) }); // Subscribe to events (DR uses the DoAnything SSE channel) + respond to the gate for await (const event of client.events.stream( task.session_id as string, task.task_id as string, )) { if (event.type === "task.input_request") { await client.deepResearch.intervene(task.task_id as string, { requestId: (event.data as { request_id: string }).request_id, response: "approve", // or { action: "approve_with_edits", edits: [...] } }); } if (event.type === "task.completed") break; } // Pull the three-piece artifact set (final.md / citations.json / confidence.json) const artifacts = await client.deepResearch.listArtifacts(task.task_id as string); const final = await client.deepResearch.getArtifact( task.task_id as string, artifacts[0]!.id as string, ); ``` ## WebSearch — query → results WS is project-scoped. `run()` defaults to `wait: true`: the server blocks for ≤30s and returns the done envelope; on timeout it returns 202 — call `get(taskId)` to poll. ```typescript // Synchronous (default) const result = await client.webSearch.run({ queries: ["best TypeScript ORM 2026"], engines: ["tavily"], summarize: true, }); // Async const pending = await client.webSearch.runAsync({ queries: ["best TypeScript ORM 2026"], }); const detail = await client.webSearch.get(pending.task_id as string); // Refine (re-run within the same task) await client.webSearch.refine(pending.task_id as string, { text: "add site:reddit.com and re-run", }); ``` ## Track — long-lived monitors Track is project-scoped. A **monitor** is a long-lived background job: cron / interval / event schedule + an extraction goal + a notify channel (webhook). Each tick produces a `snapshot`; whenever the trigger DSL judges the diff worth notifying, the channel fires. ```typescript const mon = await client.track.create({ intent: "Notify me when the iPhone 17 Pro listing on apple.com goes below $999", schedule: { kind: "interval", every_seconds: 3600 }, notifyChannel: { kind: "callback_url", url: "https://hooks.example.com/track" }, }); // Lifecycle controls — pause / resume / refine via patch: await client.track.pause(mon.id as string, { reason: "manual review" }); await client.track.resume(mon.id as string); await client.track.refine(mon.id as string, { triggerDsl: { op: "lt", field: "price", value: 999 }, }); // Manually fire a tick (bypasses schedule): const outcome = await client.track.runNow(mon.id as string); // Snapshot history (newest first): const snaps = await client.track.listSnapshots(mon.id as string); const snap = await client.track.getSnapshot( mon.id as string, snaps.items[0]!.id as string, ); // Webhook outbox + retry: const deliveries = await client.track.listDeliveries(mon.id as string, { includePayload: true, }); await client.track.retryDelivery( mon.id as string, deliveries.items[0]!.id as number, ); // Cancel terminates the monitor (terminal state): await client.track.cancel(mon.id as string); // equivalent: client.track.delete(...) ``` ### Alignment HITL (optional) If the supervisor needs you to disambiguate intent, the monitor moves to `pending_clarification` and emits an `alignment.input_request` event. Answer with `intervene()`: ```typescript await client.track.intervene(mon.id as string, { requestId: "req_align_1", response: "SKU A", }); ``` You can also push free-text guidance into the alignment queue via `client.track.message(monId, { content: "…" })`. ## Errors ```typescript import { ApiError, InsufficientCreditsError, RateLimitedError, UnauthorizedError, } from "@web-agent/sdk"; try { await client.sessions.create({ instructions: "…" }); } catch (err) { if (err instanceof InsufficientCreditsError) { console.log("top up:", err.detail, err.extra); } else if (err instanceof ApiError) { console.log(err.code, err.statusCode, err.detail); } else { throw err; } } ``` Every error class subclasses `ApiError` and exposes `code` / `statusCode` / `detail` / `extra` matching the [API error envelope](/en/web-agent/reference/#errors). ## Types DoAnything request / response types are top-level exports: ```typescript import type { Session, Task, Event, EventType, CreateSessionRequest, CreateTaskRequest, InterveneRequest, TaskStatus, SessionStatus, TerminalReason, } from "@web-agent/sdk"; ``` DR / DS / WS responses come back as `Record` (the OpenAPI envelope verbatim) — index by key (`task.task_id` / `task.status`). Each resource also exports its own option types (`DRRunOptions` / `DSRunOptions` / `WSRunOptions`). ## Next steps - [Python SDK](/en/web-agent/sdk/python) — same surface in Python. - [Errors & retries](/en/web-agent/concepts/errors-and-retries) — recommended retry policy, idempotency keys. - [Sessions & Tasks](/en/web-agent/concepts/sessions-and-tasks) — lifecycle, profiles, workspaces. --- # cURL & raw HTTP This page documents the common request patterns for WebAgent over raw HTTP — usable from any language that can send HTTPS / JSON, no official SDK required. ## Create a session ```bash curl https://api.web-agent.asix.inc/v1/projects/proj_demo_0001/do_anything/sessions \ -H "Authorization: Bearer wa_demo_xxxxxxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "instructions": "Find the top 5 stories on Hacker News right now.", "model": "claude-sonnet-4.6", "max_cost_usd": "0.50" }' ``` ## Stream events (SSE) ```bash curl -N \ -H "Authorization: Bearer wa_demo_xxxxxxxxxxxxxxxx" \ "https://api.web-agent.asix.inc/v1/projects/proj_demo_0001/do_anything/sessions/sess_demo_0001/tasks/task_demo_0001/events" ``` To resume after a drop, add the last id you received: ```bash curl -N \ -H "Authorization: Bearer wa_demo_xxxxxxxxxxxxxxxx" \ -H "Last-Event-ID: 142" \ "…/events" ``` ## Send a follow-up message ```bash curl https://api.web-agent.asix.inc/v1/projects/proj_demo_0001/do_anything/sessions/sess_demo_0001/tasks/task_demo_0001/messages \ -H "Authorization: Bearer wa_demo_xxxxxxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "content": "Also include the comment count for each." }' ``` ## Answer an input request ```bash curl https://api.web-agent.asix.inc/v1/projects/proj_demo_0001/do_anything/sessions/sess_demo_0001/tasks/task_demo_0001/intervene \ -H "Authorization: Bearer wa_demo_xxxxxxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "kind": "answer_input_request", "input_request_id": "ir_01HXX", "response": { "solved": true } }' ``` The `kind` discriminator selects the variant — same endpoint also handles `take_control` / `release_control` (see [Take Control](/en/web-agent/concepts/sessions-and-tasks#input-request)). ## Cancel a task ```bash curl -X POST https://api.web-agent.asix.inc/v1/projects/proj_demo_0001/do_anything/sessions/sess_demo_0001/tasks/task_demo_0001/cancel \ -H "Authorization: Bearer wa_demo_xxxxxxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "reason": "user_cancelled" }' ``` ## List sessions ```bash curl "https://api.web-agent.asix.inc/v1/projects/proj_demo_0001/do_anything/sessions?status=running&limit=20" \ -H "Authorization: Bearer wa_demo_xxxxxxxxxxxxxxxx" ``` ## Errors Any non-2xx response is JSON with a stable `code`: ```json { "code": "insufficient_credits", "detail": "Project balance below the minimum required ($0.50).", "extra": { "balance_usd": "0.12", "required_usd": "0.50" } } ``` See [API Overview → Errors](/en/web-agent/reference/#errors) for the full list. ## Idempotency ```bash curl … -H "Idempotency-Key: $(uuidgen)" ``` Replaying the same key returns the same response — safe to retry on network errors. --- # API Overview This page covers the conventions every WebAgent endpoint shares. For per-endpoint field details, see the [OpenAPI 3.1 spec](/openapi/v1.json) — every SDK and the Console's *Get Code* dialog generate from it. ## Base URL ``` https://api.web-agent.asix.inc ``` All resource paths are scoped to a project: ``` /v1/projects/{project_id}/... ``` ## Authentication Bearer token in the `Authorization` header. See [Authentication & API keys](/en/web-agent/getting-started/authentication). ```http Authorization: Bearer wa_xxxxxxxxxxxxxxxxxxxxxxxx ``` ## Errors JSON body, HTTP status, and a stable `code` you can switch on: ```json { "code": "session_not_found", "detail": "Session sess_demo_0001 not found.", "extra": { "session_id": "sess_demo_0001" } } ``` | Status | Common codes | |---|---| | 400 | `bad_request` | | 401 | `unauthorized` | | 402 | `insufficient_credits`, `budget_exceeded` | | 403 | `forbidden`, `safety_boundary_violated` | | 404 | `session_not_found`, `task_not_found`, `profile_not_found`, … | | 409 | `conflict` | | 422 | `validation_error` | | 429 | `rate_limit_exceeded`, `too_many_concurrent_sessions` | | 5xx | `internal_error` | ## Rate limits - Per-key concurrent sessions — set by your plan. - Per-key request rate — sliding-window; `429` with `Retry-After` on breach. - Per-project monthly credit budget — soft warning at 80%, hard stop at 100%. ## Pagination List endpoints are cursor-paginated: ```http GET /v1/projects/{pid}/do_anything/sessions?limit=50&cursor=eyJ… ``` Response includes `next_cursor` (or `null` at the end). Limits cap at 100 per page. ## Idempotency Mutating endpoints (`POST /sessions`, `POST /messages`, …) accept an optional `Idempotency-Key` header. Send the same UUID and you'll get the same response back; safe to retry. ## SSE conventions Streaming endpoints (e.g. `…/events`) emit JSON-encoded SSE events with a numeric `id`. To reconnect cleanly, send `Last-Event-ID: ` and the server replays everything after that id. ## Next steps - [OpenAPI spec](/openapi/v1.json) — every endpoint, every field, machine-readable. - [Sessions & Tasks](/en/web-agent/concepts/sessions-and-tasks) — the resource model. - [Errors & retries](/en/web-agent/concepts/errors-and-retries) — full error-code matrix. --- # Vibecoding with WebAgent If you're writing code with an LLM in your IDE — Cursor, Claude Code, Aider, Continue, anything — you can hand WebAgent to it as one URL. It'll write the integration for you. ## Three URLs to know | URL | What it is | Feed it to | |---|---|---| | [`/en/web-agent/llms.txt`](/en/web-agent/llms.txt) | Short index of every page, with one-line descriptions | Any LLM agent — fast to scan | | [`/en/web-agent/llms-full.txt`](/en/web-agent/llms-full.txt) | The entire documentation site as one plain markdown file (~200 KB) | Long-context models — paste as system prompt | | [`/openapi/v1.json`](/openapi/v1.json) | OpenAPI 3.1 spec for every endpoint | Code-generating agents — they'll write typed clients from this | These URLs are stable. They will never move, even if we rename internal pages. ## Drop-in prompt Copy this into your IDE's system prompt, rules file, or first message: ``` You are integrating WebAgent. Reference docs: - https://docs.web-agent.asix.inc/llms-full.txt (full docs as one file) - https://docs.web-agent.asix.inc/openapi/v1.json (OpenAPI 3.1 spec) API conventions: - Base URL: https://api.web-agent.asix.inc - Bearer auth: header `Authorization: Bearer wa_…` - Path-scoped to project: /v1/projects/{project_id}/... - Wire fields are snake_case. Decimals are JSON strings ("10.00", not 10.00). - Most mutations accept Idempotency-Key. SDK packages: - Python: `pip install web-agent-sdk` - TypeScript: `npm install web-agent-sdk` Both SDKs auto-reconnect SSE streams via Last-Event-ID. Prefer them over hand-rolled HTTP unless asked otherwise. Always read the OpenAPI spec before guessing field names. ``` ## Cursor `.cursor/rules/webagent.md`: ```markdown --- description: Conventions for integrating with WebAgent globs: ["**/*.{ts,tsx,py}"] --- When writing WebAgent code, follow https://docs.web-agent.asix.inc/llms-full.txt. Field names are authoritative in https://docs.web-agent.asix.inc/openapi/v1.json — never guess. Use the official SDKs (`web-agent-sdk`) unless the user asks for raw HTTP. ``` ## Claude Code Add to your project's `CLAUDE.md`: ```markdown ## WebAgent integration Reference: https://docs.web-agent.asix.inc/llms-full.txt OpenAPI: https://docs.web-agent.asix.inc/openapi/v1.json SDK: `web-agent-sdk` (Python and TypeScript). Wire fields are snake_case. Stream task events via the SDK's `.stream()` helper, which handles `Last-Event-ID` reconnection. ``` ## Console "Get Code" dialog The fastest way to get a working snippet: open the [Console](https://console.web-agent.asix.inc/new), fill the form, click **Get Code**. You get four tabs (Prompt for an LLM agent / Python / TypeScript / cURL), each pre-filled with your real key and current configuration. Paste into your editor. ## Why this works - We control the URLs, so they don't 404 across major versions. - The full-docs file is markdown, not HTML — every LLM ingests it cleanly. - The OpenAPI spec is the same single source of truth our SDKs and Console are generated from. No drift. ## Next steps - [Quickstart](/en/web-agent/getting-started/quickstart) — first task in five minutes. - [API Reference](/en/web-agent/reference/) — interactive, with Try-it.