A reference for the curious: the stack, the schema, and the pipeline that turns a stream of voice notes and journal entries into a searchable memory.
Overview
This journal is built from three boring pieces and one interesting one.
The boring pieces:
- Ghost 5.x — the content management system. Runs on a DigitalOcean droplet, renders every published post at
danialdaud.com, hosts images and audio at/content/. Ghost provides the public surface and the editor. - Supabase (Postgres + Storage) — the database and file store for everything the bot writes. Posts, tags, images, audio files, and a memory layer on top of all of it. The source of truth for anything generated by the bot.
- A Node.js Telegram bot — a small long-running process on the same droplet, managed by PM2. Receives messages, routes them, calls tools, talks to Supabase and Ghost.
The interesting piece:
- A memory layer — a set of schemas, indexes, and background pipelines that turn every post into (a) an embedding, (b) a set of extracted entities, and (c) a web of relationships between them. Built on
pgvectorinside the same Supabase Postgres.
The two memory systems
When you talk to the bot, two distinct kinds of memory are running side by side.
Short-term: conversation memory
Lives in the Node process's RAM — a Map<string, Session> keyed by (channel, userId). Each session holds the last 20 messages, trimmed at turn boundaries so tool-use/tool-result pairs are never orphaned. Timeout after 30 minutes of inactivity. Wiped on any restart.
This is what makes the bot feel like a conversation. It has nothing to do with the journal's long-term memory.
Long-term: journal memory
Lives in Supabase Postgres. Persistent, searchable, growing. Described in detail below.
The schema
Seven tables, one jsonb field, two vector columns, two RPCs.
posts
id uuid ghost_id text Ghost post id, unique slug text URL-safe, unique title text content_md text user's words, canonical content_html text rendered HTML, what Ghost serves excerpt text auto-filled from metadata.summary on publish feature_image_url text Ghost-hosted /content/images/... status text draft | published reading_time int published_at timestamptz synced_at timestamptz embedding vector(1536) OpenAI text-embedding-3-small metadata jsonb summary, mood, topics, people, places, works, ...
The embedding column has an HNSW index (vector_cosine_ops) for fast approximate nearest-neighbour search. The metadata column is a GIN-indexed jsonb blob generated by Claude Haiku at write time.
tags, post_tags, media
tags and post_tags are standard many-to-many. media holds every image and audio file linked to a post:
id uuid kind text image | audio | other source text ghost | telegram | bot-upload source_ref text dedup key bucket_path text Supabase Storage path signed_url text long-lived signed URL signed_url_expires_at timestamptz mime text size_bytes bigint duration_seconds real audio only transcript text audio only post_id uuid FK to posts
Every photo and voice note gets a row. Dedup key (source, source_ref) prevents double-uploads from the same Telegram file id.
entities, post_entities, entity_edges
The knowledge graph. entities holds every person, place, topic, work, project, org, or event the system has ever found. post_entities links them to posts that mention them. entity_edges stores the relationships (LIVED_AT, INSPIRED_BY, WROTE, VISITED, ...) extracted by Claude Haiku from the post body at write time.
RPCs
Two Postgres functions exposed to the bot:
match_posts(query_embedding, match_count, min_similarity)— cosine-similarity nearest neighbours overposts.embeddingentity_neighborhood(entity_id)— one-hop walk overentity_edgesin both directions
The pipeline: what happens when you write a post
Telegram message
│
▼
lib/adapters/telegram.js download file, authorise, normalise
│
▼
lib/core/intake.js voice → Whisper → transcript → media row
photo → feature image → media row
│
▼
lib/posts.js : createPost dual-write: Supabase + Ghost Admin API
│
├──▶ Supabase posts, tags, post_tags, media
│
├──▶ Ghost posts/?source=html
│ images/upload/
│ media/upload/
│
▼
lib/posts.js : runMemoryAsync fire-and-forget
│
├──▶ lib/embed.js OpenAI text-embedding-3-small
│ │
│ ▼
│ posts.embedding (vector 1536)
│
├──▶ lib/memory.js Claude Haiku structured extraction
│ │
│ ▼
│ posts.metadata (jsonb: summary, mood, topics, ...)
│
└──▶ lib/entities.js Claude Haiku entity + edge extraction
│
▼
entities · post_entities · entity_edges
The post is returned to the bot in under a second. The memory pipeline finishes in the background within a few seconds. Every layer becomes searchable immediately.
Retrieval
The bot exposes four read paths into the memory layer, as Claude tools:
search_posts_semantic(query)— embeds the query, callsmatch_postsRPC, returns top-k withurl,audio_url,image_urls, and similarity scoresearch_posts_fts(query)— PostgrestextSearchoncontent_mdand title, for exact keyword matchingsearch_by_metadata({ mood, topic, date_from, date_to })— jsonb path filter onmetadata->>moodandmetadata->topicsget_entity(name)+get_entity_neighborhood(name)— walks the graph
All results are enriched by lib/search.js:enrichPosts() which batch-fetches linked media and post URLs so the bot can link them cleanly in replies.
Bot architecture
Telegram ─┐
├─▶ lib/adapters/{telegram, http}
HTTP POST ─┘ │
▼
lib/core/{intake, session, chat, tools, tool-handlers,
system-prompt}
│
▼
lib/{posts, search, memory, entities, media-mirror,
ghost-sync, embed, transcribe}
│
▼
Supabase Postgres + Storage + Ghost Admin API
Adapters normalise incoming messages to a common shape and hand off to the core. The core runs the Claude tool-use loop. Tools call into lib functions. Lib functions talk to Supabase and Ghost.
Adding a new input channel (email, iOS Shortcut, web form) is one new adapter file — the rest of the stack does not change.
Ghost integration
Ghost is not just a front-end. The bot integrates with Ghost's Admin API for:
- Posts dual-write — every post create/update/delete from the bot hits Ghost so
danialdaud.comstays in sync - Images upload (
/ghost/api/admin/images/upload/) — feature images for photo posts, permanent/content/images/...URLs - Media upload (
/ghost/api/admin/media/upload/with fallback to/files/upload/) — audio files for voice notes, permanent/content/media/...or/content/files/...URLs - Pages (
/ghost/api/admin/pages/) — static pages like this one and/memory - Webhooks (inbound) — Ghost pings a local endpoint on every
post.added,post.edited,post.published,post.unpublished,post.deleted. The bot re-syncs the post from Ghost into Supabase and re-runs the memory pipeline. This closes the drift loop when a post is edited directly in Ghost admin.
What's open-source and what's not
The bot's code lives in a private git repo. The journal content lives in Supabase and Ghost. Neither is public.
The stack itself is made of open ingredients:
- Ghost — MIT
- Supabase — Apache 2.0 (self-hostable)
- pgvector — PostgreSQL license
- Node.js, PM2, nginx, certbot — all open
- OpenAI text-embedding-3-small — proprietary, API-accessed
- Anthropic Claude — proprietary, API-accessed
Total monthly cost for a personal journal with this architecture: roughly the cost of the DigitalOcean droplet (around $6–$12) plus whatever you spend on LLM and embedding calls. For the volume of a personal journal, LLM costs are pennies per month.
Want the human version of this, with less jargon and more about why it matters? → How this journal remembers