Shapes
Shapes are typed record schemas that define the contract between skills and the engine. A shape declares what a record looks like: field names, types, relations to other records, and display rules.
Shapes live in shapes/*.yaml in source directories. The engine loads them at boot. Use agentos test <skill> to validate that your skill’s output matches the declared shapes (see Testing).
Format
product:
also: [other_shape] # "a product is also a ..." (optional)
fields:
price: string
price_amount: number
prime: boolean
relations:
contains: item[] # array relation
brand: organization # single relation
display:
title: name
subtitle: author
image: image
date: datePublished
columns:
- name: Name
- price: Price
also (tag implication)
Declares that this shape is also another shape. An email is also a message. A book is also a product. When the engine tags a record with email, it transitively applies message too. Both shapes’ fields contribute to the record’s type context.
also is transitive: if A is also B and B is also C, then A is also B and C.
Field types
| Type | Stored as | Notes |
|---|---|---|
string | text | Short text |
text | text | Long text, FTS eligible |
integer | digits | Parsed from strings, floats truncated |
number | decimal | Parsed from strings |
boolean | true/false | Coerced from 1/0, “yes”/“no”, “true”/“false” |
datetime | ISO 8601 | Unix timestamps auto-converted, human dates parsed |
url | text | Stored as-is, rendered as clickable link |
string[] | JSON array | Each element coerced to string |
integer[] | JSON array | Each element coerced to integer |
json | JSON string | Opaque blob, no coercion |
Standard fields
These are available on every record without declaring them in a shape:
| Field | Type | Purpose |
|---|---|---|
id | string | Record identifier |
name | string | Primary label |
text | text | Short summary |
url | url | Canonical link |
image | url | Thumbnail |
author | string | Creator |
published | datetime | Temporal anchor |
content | text | Long body text (FTS, stored separately) |
Relations
Relations declare connections to other records. Keys are edge labels, values are target shapes (shape or shape[] for arrays).
Display
The display section tells renderers how to present this record:
title— primary label fieldsubtitle— secondary labeldescription— preview textimage— thumbnaildate— temporal anchor for sort/displaycolumns— ordered list for table views
Design Principles
These principles guide shape design. Use the review checklist below after writing or editing a shape.
1. Entities over fields
If a field value is itself a thing with identity, it should be a relation to another shape, not a string field.
Bad: shipping_address: string (an address is a thing)
Good: shipping_address: place (a relation to a place record)
Bad: email: string on a person (an email is an account)
Good: accounts: account[] relation on person
Ask: “Could this field value have its own page?” If yes, it’s a relation.
2. Separate identity from role
A person doesn’t have a job title. A person holds a role at an organization for a period of time. The role is the relationship, not a field on the person.
Bad: job_title: string on person
Good: role: role[] relation where the role record carries title, organization, start_date, end_date
Same pattern applies to education, membership, authorship. If it has a time dimension or involves another entity, it’s a role/relationship, not a field.
3. Currency always accompanies price
Any field representing a monetary amount needs a companion currency field. Never assume USD.
Bad: price_amount: number alone
Good: price_amount: number + currency: string
4. URLs that reference other things are relations
The standard url field is the record’s own canonical link. But URLs that point to other things should be relations to the appropriate shape.
Bad: website: url on an organization (a website is its own entity)
Good: website: website relation
Bad: external_url: url on a post (the linked page is a thing)
Good: links_to: webpage relation
Ask: “Is this URL the record itself, or does it point to something else?”
- Record’s own link: keep as
url(standard field) - Points to another thing: make it a relation
5. Keep shapes domain-agnostic
A shape should describe the kind of thing, not the source it came from. Flight details don’t belong on an offer shape. Browser-specific fields don’t belong on a webpage shape.
Bad: total_duration: integer, flights: json, layovers: json on offer (that’s a flight, not an offer)
Good: offer has price + currency + offer_type. Flight is its own shape. Offer relates to flight.
6. Use also for genuine “is-a” relationships
also means tag implication: tagging a record with shape A also tags it with shape B. Use it when querying by B should include A.
Good uses:
emailalsomessage(querying messages should include emails)videoalsopost(querying posts should include videos)bookalsoproduct(querying products should include books)reviewalsopost(querying posts should include reviews)
Bad uses:
- Don’t use
alsojust because shapes share some fields - Don’t create deep chains (A also B also C also D) — keep it shallow
7. Author is a shape, not just a string
The standard author field is a string for convenience. But when the author is a real entity with their own identity (a book author, a blog writer, a video creator), use a relation to the author or account shape.
Quick attribution: author: "Paul Graham" (standard string field)
Rich attribution: written_by: author or posted_by: account (relation)
Both can coexist — the string is for display, the relation is for traversal.
8. Address/Place is structured, not a string
Physical locations should be a place shape with structured fields (name, street, city, region, postal_code, country, coordinates). Inspired by Mapbox’s geocoding model.
9. Playlists, shelves, and lists belong to accounts
Any collection (playlist, shelf, list, board) should have a belongs_to: account relation. Collections are owned.
10. Use ISO standards for standardized values
When a field represents something with an international standard, use the standard code:
- Human languages — ISO 639-1 codes (
en,es,ja,pt-BR). Applies to transcript.language, webpage.language, content language fields. NOT programming languages (those use conventional names likePython,Rust). - Countries — ISO 3166-1 alpha-2 codes (
US,GB,JP). Usecountry_codefield. - Currencies — ISO 4217 codes (
USD,EUR,JPY). Usecurrencyfield. - Timezones — IANA timezone names (
America/New_York,Europe/London).
Don’t enforce via enum (too many values). Document the convention and let agentos test flag non-compliant values. See Testing & Validation for how to run shape validation.
11. Separate content from context (NEPOMUK principle)
A video is a file. The social engagement around it is a post. A transcript is text. The meeting it came from is the context. Don’t mix artifact properties with social properties on the same shape.
Bad: video has view_count, like_count, comment_count, posted_by (those are social context)
Good: video is a file with duration + resolution. A post contains the video and carries the engagement.
Ask: “If I downloaded this to my hard drive, which fields would still make sense?” Those are the artifact fields. Everything else is context that belongs on a wrapper entity.
12. Comments are nested posts, not a separate shape
A comment is a post that replies_to another post. A reply to a message is still a message. Don’t create separate shapes for nested versions of the same thing — use the replies_to relation to express the hierarchy.
13. Booleans describe state, relations describe lineage
is_fork: boolean tells you nothing. forked_from: repository tells you the lineage. If a boolean implies a relationship to another entity, model the relationship instead.
Bad: is_fork: boolean (from what?)
Good: forked_from: repository (the source is traversable)
14. Booleans that encode direction are really relationships
is_outgoing: boolean on a message means “I sent this.” But that information already lives in the from: account relation — if the from account is the user, it’s outgoing. Don’t duplicate relationship semantics as boolean flags.
Bad: is_outgoing: boolean on message
Good: from: account relation — direction is derived by comparing from to the current user
Same pattern: is_sent, is_received, is_mine — all derivable from a directional relation.
15. Booleans that encode cardinality are derivable
is_group: boolean on a conversation means “has more than two participants.” That’s not state — it’s a count. Don’t store what you can derive from the structure.
Bad: is_group: boolean on conversation
Good: participant: account[] relation — is_group is len(participants) > 2
Same pattern: has_attachments (derive from attachment: file[]), has_unread (derive from messages), is_empty (derive from children).
16. Source data doesn’t dictate shape
A skill’s source (API, database, scrape) returns whatever it returns. That doesn’t constrain the shape. The Python function is the transformation boundary — it takes raw source data and returns shape-native dicts.
Apple Contacts gives flat strings: Organization: Anthropic, Title: Engineer. That doesn’t mean person gets organization: string. It means the skill transforms those strings into a roles: role[] typed ref.
Bad: “The API returns platform: string, so the shape needs a platform field”
Good: “What kind of thing is this? Model it correctly. The skill transforms source data to fit.”
Design shapes for the domain, not for the source. Every skill file is a template — other agents copy the patterns they see.
17. Model life like LinkedIn, not like a spreadsheet
People have roles at organizations. Roles have titles, departments, start dates, end dates. Education is a role at a school. Membership is a role in a community. Authorship is a role on a publication.
The LinkedIn mental model: a person has a timeline of positions, each connecting them to an organization with a title and time range. This is principle #2 made concrete.
person --roles--> role[] --organization--> organization
--title: "Engineer"
--department: "Research"
--start_date: 2024-01-15
--end_date: null (current)
This applies broadly: board membership, team membership, project assignment, course enrollment. If a relationship has a time dimension or a title, it’s a role.
Review Checklist
After writing or editing a shape, ask yourself:
- Fields or relations? For each string field, ask: “Is this value itself an entity?” If yes, make it a relation.
- Currency with price? Every monetary amount has a currency companion.
- URLs audited? Is each URL the record’s own link, or does it point to another entity?
- Domain-agnostic? Would this shape make sense for a different source providing the same kind of thing?
-
alsojustified? Does thealsochain represent genuine “is-a” relationships that aid cross-type queries? - Author modeled correctly? Is the author a string (quick attribution) or a relation (traversable entity)?
- Addresses structured? Are locations/addresses relations to place, not inline strings?
- Collections owned? Do lists/playlists/shelves have a
belongs_torelation? - Roles, not fields? Are time-bounded relationships (jobs, education, membership) modeled as role relations, not person fields?
- Display makes sense? Are the right fields in title/subtitle/columns for this shape?
- Content vs context? If this is a media artifact, are social metrics on a wrapper post instead?
- Nesting via reply_to? Is a “sub-type” really just this shape with a parent relation?
- ISO standards? Are languages (ISO 639-1), countries (ISO 3166-1), currencies (ISO 4217) using standard codes?
- Booleans or relations? Does any boolean imply a relationship? (
is_fork→forked_from) - Direction booleans? Is
is_outgoing/is_sentderivable from afromrelation? - Cardinality booleans? Is
is_group/has_attachmentsderivable from counting a relation? - Source-independent? Did you design for the domain, or did the API shape leak into the schema?
- Roles modeled as LinkedIn? Are jobs/education/memberships
role[]relations with title + org + time range?
Returning shape-native data from operations
When an operation declares returns: email[], the Python function returns dicts whose keys match the shape. The shape is the contract — no separate mapping layer sits between the Python code and the engine.
# shapes/email.yaml
email:
also: [message]
fields:
from_email: string
to: string[]
cc: string[]
labels: string[]
thread_id: string
relations:
from: account
conversation: conversation
display:
title: name
subtitle: from_email
date: datePublished
# skill.yaml
operations:
get_email:
returns: email # points to the email shape
python:
module: ./gmail.py
function: get_email
# gmail.py — returns email-shaped dicts directly
def get_email(id: str, _call=None) -> dict:
return {
"id": msg_id,
"name": subject, # standard field
"text": snippet, # standard field
"url": web_url, # standard field
"published": date, # standard field
"content": body_text, # standard field (FTS)
"from_email": sender, # shape-specific field
"to": recipients, # shape-specific field
"labels": label_ids, # shape-specific field
}
The Python code does the field mapping — it transforms raw API responses into shape-native dicts. Standard fields (id, name, text, url, image, author, published, content) are available on every shape without declaring them.
Canonical fields
The renderer resolves entity display from standard fields. Every Python return should populate as many of these as the source data supports — they drive consistent previews, detail views, and search results across all skills.
| Field | Purpose |
|---|---|
name | Primary label / title |
text | Short summary or snippet for preview rows |
url | Clickable link |
image | Thumbnail / hero image |
author | Creator / brand / owner |
published | Temporal anchor |
content | Long body text (stored separately, FTS-indexed) |
Not every entity has all of these — a product may have no published, an order may have no image. Map what the source provides; skip what doesn’t apply.
Typed references (entity relationships)
To create linked entities and graph edges, return nested dicts keyed by entity type:
def get_email(id: str, _call=None) -> dict:
return {
"id": msg_id,
"name": subject,
# Single typed ref — creates: email --from--> account
"from": {
"account": {
"handle": sender_email,
"platform": "email",
"display_name": sender_name,
}
},
# Array typed ref — creates: email --to--> account (one per recipient)
"to": {
"account[]": [
{"handle": addr, "platform": "email", "display_name": name}
for addr, name in recipients
]
},
}
The outer key (from, to) becomes the edge label. The inner key (account, account[]) is the entity tag. The engine auto-creates/deduplicates the linked entity and adds the edge.
A typed ref is collapsed to null if none of its identity fields (id or name) survive — so partial data doesn’t create ghost entities.
Validation
Shape conformance is checked at two levels:
Pre-commit (static)
bin/audit-skills.py parses Python return dict literals via AST and warns if keys don’t match the declared shape. Runs automatically on every commit. Catches dict-literal returns but misses dynamic construction, helper functions, and _call composition.
Runtime
The engine validates every entity-returning skill call after execution. If the returned data contains keys not declared in the shape (fields, relations, or standard fields), a warning is logged to engine.log. Missing identity fields (id and name) also trigger warnings.
Runtime validation catches everything the static check misses — it sees the actual data. Check ~/.agentos/logs/engine.log for Shape conformance warnings after running a skill.
Both checks are advisory (warnings, not errors). They exist to surface non-conformant skills, not to block execution.
Prior Research
Extensive entity modeling research lives in /Users/joe/dev/entity-experiments/. These are not authoritative — many are outdated — but contain valuable principles and platform analysis worth consulting when designing new shapes.
Entity & Ontology Research
schema-entities.md— Core entity type definitions, OGP foundation, Joe’s hypotheses on note vs articleschema-relationships.md— Relationship type catalog and design patternsresearch/entities/open-graph-protocol.md— OGP types, why flat beats hierarchicalresearch/entities/google-structured-data.md— Schema.org structured data patterns
Platform Research
research/platforms/google-takeout.md— 72 Google products analyzed for entity types (Contacts, Calendar, Drive, Gmail, Photos, YouTube, Maps, Chrome, Pay, Play)research/platforms/facebook-graph.md— Facebook Graph API entity modelresearch/platforms/familysearch.md— GEDCOM X genealogical data model (two relationship types + qualifiers, computed derivations, source citations)
Relationship Research
research/relationships/genealogical-relationships.md— Family relationship modeling patternsresearch/relationships/relationship-modeling.md— General relationship designresearch/relationships/schema-org-relationships.md— Schema.org relationship typesresearch/relationships/ogp-relationships.md— OGP relationship patternsresearch/relationships/no-orphans-constraint.md— Why every entity needs at least one connection
Systems Research
research/systems/outcome-entity.md— Outcome/goal entity modelingresearch/context/pkm-community.md— Personal knowledge management patternsresearch/context/semantic-file-systems.md— NEPOMUK and semantic desktop research