Reverse Engineering — Social Network Patterns
How to model people, relationships, and social data across platforms like Goodreads, Twitter/X, MySpace, LinkedIn, Instagram, etc.
This is Layer 5 of the reverse-engineering docs:
- Layer 1: Transport — 1-transport — getting a response at all
- Layer 2: Discovery — 2-discovery — finding structured data in bundles
- Layer 3: Auth & Runtime — 3-auth — credentials, sessions, rotating config
- Layer 4: Content — 4-content — extracting data from HTML when there is no API
- Layer 5: Social Networks (this file) — modeling people, relationships, and social graphs
- Layer 6: Desktop Apps — 6-desktop-apps — macOS, Electron, local state, unofficial APIs
Core Principle: People First, Accounts Second
Every social platform has users. But the same person exists across many platforms. The graph should model this in two layers:
| Entity | What it represents | Cross-platform? |
|---|---|---|
| person | A real human being | Yes — mergeable across platforms |
| account | Their profile on one platform | No — platform-specific |
A person has accounts. An account belongs to a person.
adapters:
person:
id: .user_id
name: .name
image: .photo_url
location: .location
data.gender: .gender
data.age: .age
data.birthday: .birthday
data.website: .website
has_account:
account:
id: .user_id
name: .name
handle: .handle
url: .profile_url
image: .photo_url
Why this matters: When you later build Twitter and find the same person (by name, website, or explicit cross-link), you can merge the person entities while keeping both accounts distinct. The person is the anchor.
Social Relationship Types
Every social network has some subset of these relationship patterns:
Symmetric (mutual)
Both parties agree. The relationship is bidirectional.
| Relationship | Examples |
|---|---|
| friends | Facebook, Goodreads, MySpace |
Operation pattern: list_friends(user_id) → person[]
Asymmetric (directed)
One party follows, the other may or may not follow back.
| Relationship | Examples |
|---|---|
| following | Twitter, Instagram, Goodreads |
| followers | Twitter, Instagram, Goodreads |
Operation pattern: two separate operations with different directions.
list_following:
description: People this user follows
returns: person[]
list_followers:
description: People following this user
returns: person[]
Group membership
User belongs to a group/community.
| Relationship | Examples |
|---|---|
| member_of | Goodreads groups, Facebook groups, Reddit subreddits, Discord servers |
list_groups:
returns: group[]
Profile Depth: Light vs Rich
Social operations return people at two levels of depth:
Light (from list operations)
When you scrape a friends list or followers page, you get limited data per person:
{
"user_id": "10000001",
"name": "Alex Reader",
"photo_url": "https://...",
"location": "Berlin",
"books_count": 414,
"friends_count": 138,
}
This is what list_friends, list_following, list_followers return.
Enough to create the person entity and the relationship edge.
Rich (from profile scrape)
When you scrape an individual profile page, you get the full picture:
{
"user_id": "10000001",
"name": "Alex Reader",
"handle": "alexreader",
"photo_url": "https://...",
"gender": "...",
"age": 32,
"birthday": "...",
"location": "Berlin, Germany",
"website": "https://example.com",
"about": "...",
"interests": "...",
"joined_date": "January 2015",
"ratings_count": 159,
"avg_rating": "3.82",
"friends_count": 138,
"favorite_books": [...],
"currently_reading": [...],
"favorite_genres": [...],
}
This is what get_person(user_id) returns.
Pattern: Always provide both. The light operations populate the graph with
stubs. The rich operation fills them in when you need the detail. The adapter
handles both — missing fields are just null.
Authors Are People Too
On platforms with content creators (Goodreads authors, Twitter blue-checks, YouTube channels), the creators are people with special roles. Model them as:
- person entity (they’re a human being)
- author/creator entity (their creative identity)
- account entity (their platform presence)
On Goodreads, an author appears in multiple contexts:
| Context | How we encounter them |
|---|---|
Book’s written_by relationship | author entity with ID and URL |
list_following results | person entity (they follow authors) |
| Quote attribution | author entity |
| Author profile page | full author entity with books |
The key insight: extract real author IDs everywhere, not just name strings.
When a book list shows “Christie, Agatha” as a link to /author/show/123715,
capture the ID so the graph can connect the book → author → their other books.
author_el = row.select_one("td.field.author a")
if author_el:
href = author_el.get("href", "")
m = re.search(r"/author/show/(\d+)", href)
if m:
author_id = m.group(1)
author_url = _abs_url(href)
Also fix name ordering — many platforms store names as “LastName, FirstName” in table views:
def _flip_name(name: str) -> str:
if "," in name:
parts = [p.strip() for p in name.split(",", 1)]
if len(parts) == 2 and parts[1]:
return f"{parts[1]} {parts[0]}"
return name
Content People Create
Social platforms aren’t just about connections — people create content. Each platform has its own content types that should map to entities:
| Platform | Content types | Entity mapping |
|---|---|---|
| Goodreads | Books read, reviews, ratings, quotes | book, review, quote |
| Tweets, retweets, likes | post, engagement | |
| MySpace | Music, blog posts, comments | track, post, comment |
| Photos, stories, reels | media, story | |
| Posts, articles, endorsements | post, article |
The person’s relationship to content matters:
# Things a person created
person → wrote → review
person → posted → post
# Things a person engaged with
person → rated → book (with rating value)
person → liked → quote
person → saved → book (to shelf)
# Things attributed to a person
quote → attributed_to → author
book → written_by → author
Profile Page Parsing Patterns
Social profile pages follow remarkably similar structures across platforms. Common patterns:
Info box / details section
Most profiles have a key-value info section:
titles = soup.select(".infoBoxRowTitle")
items = soup.select(".infoBoxRowItem")
info = {}
for t, v in zip(titles, items):
label = clean(t.get_text()).lower()
value = clean(v.get_text())
info[label] = value
Stats bar
Ratings, posts, followers — usually near the top:
stats_text = clean(stats_el.get_text())
ratings = re.search(r"([\d,]+)\s+ratings?", stats_text)
avg = re.search(r"\(([\d.]+)\s+avg\)", stats_text)
Section headers → content blocks
Profile pages have named sections (favorite books, currently reading, groups). The header-to-content relationship varies by platform:
# Pattern 1: Header is inside a container, content is a sibling div
for hdr in soup.select("h2.brownBackground"):
parent_box = hdr.find_parent("div", class_="bigBox")
body = parent_box.select_one(".bigBoxBody") if parent_box else None
# Pattern 2: Header IS the container, content follows
for hdr in soup.select(".sectionHeader"):
body = hdr.find_next_sibling()
# Pattern 3: Header + content share a parent
for section in soup.select(".profileSection"):
title = section.select_one("h3")
content = section.select_one(".sectionContent")
Always check the actual DOM structure — don’t assume.
Pagination for Social Lists
Social lists (friends, followers, following) almost always paginate. Key patterns from Goodreads that will apply elsewhere:
Auto-pagination with page=0
def list_friends(user_id, page=0, ...):
"""page=0 means fetch all pages automatically."""
if page > 0:
return _fetch_single_page(page)
all_items = []
seen = set()
for p in range(1, MAX_PAGES + 1):
items = _fetch_single_page(p)
new = [i for i in items if i["user_id"] not in seen]
all_items.extend(new)
seen.update(i["user_id"] for i in new)
if not _has_next(html_text):
break
return all_items
Next-page detection
def _has_next(html_text: str) -> bool:
return 'class="next_page"' in html_text or "rel=\"next\"" in html_text
Safety limits
Always cap pagination to prevent infinite loops:
MAX_PAGES = 50
Cross-Platform Identity Signals
When building skills for multiple social networks, look for identity signals that help merge person entities across platforms:
| Signal | Reliability | Example |
|---|---|---|
| Explicit cross-link | High | Website URL in bio pointing to another profile |
| Same handle | Medium | @jcontini on both Twitter and Goodreads |
| Same name + location | Low | “Joe Contini, Austin TX” |
| Same profile photo | Medium | Image similarity matching |
| Email (if available) | High | Unique identifier |
For now, just capture everything. The website field on a person’s profile
is particularly valuable — it often links to a personal site that aggregates
all their social profiles.
Checklist for a New Social Network Skill
When building a skill for a new social platform:
- Identify the entity types — what do people create, consume, and engage with?
- Map relationships — friends? followers? groups? what content do they produce?
- Model as person → account — not just accounts
- Light + rich profiles — list operations for stubs, get_person for detail
- Extract real IDs everywhere — not just name strings; follow links for IDs
- Capture cross-platform signals — website, handle, email
- Auto-paginate social lists — friends, followers, etc. are always paginated
- Handle name formatting — “LastName, FirstName” flipping, Unicode, etc.
- Look for section-based profile data — favorite X, currently Y, groups, etc.
- Test with a real profile — verify data richness against what you see in the browser
Real-World Examples
| Skill | Social patterns used | Reference |
|---|---|---|
skills/goodreads/ | person → account, friends, following/followers, groups, quotes, authors as people, favorite books, currently reading, profile scraping | web_scraper.py |
Future: skills/myspace/ | person → account, friends, followers, music, blog posts | — |
Future: skills/twitter/ | person → account, following/followers, tweets, likes, retweets | — |