A reference for the curious: the stack, the schema, and the pipeline that turns a stream of voice notes and journal entries into a searchable memory.

Overview

This journal is built from three boring pieces and one interesting one.

The boring pieces:

  • Ghost 5.x — the content management system. Runs on a DigitalOcean droplet, renders every published post at danialdaud.com, hosts images and audio at /content/. Ghost provides the public surface and the editor.
  • Supabase (Postgres + Storage) — the database and file store for everything the bot writes. Posts, tags, images, audio files, and a memory layer on top of all of it. The source of truth for anything generated by the bot.
  • A Node.js Telegram bot — a small long-running process on the same droplet, managed by PM2. Receives messages, routes them, calls tools, talks to Supabase and Ghost.

The interesting piece:

  • A memory layer — a set of schemas, indexes, and background pipelines that turn every post into (a) an embedding, (b) a set of extracted entities, and (c) a web of relationships between them. Built on pgvector inside the same Supabase Postgres.

The two memory systems

When you talk to the bot, two distinct kinds of memory are running side by side.

Short-term: conversation memory

Lives in the Node process's RAM — a Map<string, Session> keyed by (channel, userId). Each session holds the last 20 messages, trimmed at turn boundaries so tool-use/tool-result pairs are never orphaned. Timeout after 30 minutes of inactivity. Wiped on any restart.

This is what makes the bot feel like a conversation. It has nothing to do with the journal's long-term memory.

Long-term: journal memory

Lives in Supabase Postgres. Persistent, searchable, growing. Described in detail below.

The schema

Seven tables, one jsonb field, two vector columns, two RPCs.

posts

id                 uuid
ghost_id           text         Ghost post id, unique
slug               text         URL-safe, unique
title              text
content_md         text         user's words, canonical
content_html       text         rendered HTML, what Ghost serves
excerpt            text         auto-filled from metadata.summary on publish
feature_image_url  text         Ghost-hosted /content/images/...
status             text         draft | published
reading_time       int
published_at       timestamptz
synced_at          timestamptz
embedding          vector(1536) OpenAI text-embedding-3-small
metadata           jsonb        summary, mood, topics, people, places, works, ...

The embedding column has an HNSW index (vector_cosine_ops) for fast approximate nearest-neighbour search. The metadata column is a GIN-indexed jsonb blob generated by Claude Haiku at write time.

tags, post_tags, media

tags and post_tags are standard many-to-many. media holds every image and audio file linked to a post:

id                      uuid
kind                    text        image | audio | other
source                  text        ghost | telegram | bot-upload
source_ref              text        dedup key
bucket_path             text        Supabase Storage path
signed_url              text        long-lived signed URL
signed_url_expires_at   timestamptz
mime                    text
size_bytes              bigint
duration_seconds        real        audio only
transcript              text        audio only
post_id                 uuid        FK to posts

Every photo and voice note gets a row. Dedup key (source, source_ref) prevents double-uploads from the same Telegram file id.

entities, post_entities, entity_edges

The knowledge graph. entities holds every person, place, topic, work, project, org, or event the system has ever found. post_entities links them to posts that mention them. entity_edges stores the relationships (LIVED_AT, INSPIRED_BY, WROTE, VISITED, ...) extracted by Claude Haiku from the post body at write time.

RPCs

Two Postgres functions exposed to the bot:

  • match_posts(query_embedding, match_count, min_similarity) — cosine-similarity nearest neighbours over posts.embedding
  • entity_neighborhood(entity_id) — one-hop walk over entity_edges in both directions

The pipeline: what happens when you write a post

Telegram message
      │
      ▼
lib/adapters/telegram.js     download file, authorise, normalise
      │
      ▼
lib/core/intake.js            voice → Whisper → transcript → media row
                              photo → feature image → media row
      │
      ▼
lib/posts.js : createPost     dual-write: Supabase + Ghost Admin API
      │
      ├──▶ Supabase   posts, tags, post_tags, media
      │
      ├──▶ Ghost      posts/?source=html
      │               images/upload/
      │               media/upload/
      │
      ▼
lib/posts.js : runMemoryAsync    fire-and-forget
      │
      ├──▶ lib/embed.js            OpenAI text-embedding-3-small
      │         │
      │         ▼
      │    posts.embedding (vector 1536)
      │
      ├──▶ lib/memory.js           Claude Haiku structured extraction
      │         │
      │         ▼
      │    posts.metadata (jsonb: summary, mood, topics, ...)
      │
      └──▶ lib/entities.js         Claude Haiku entity + edge extraction
                │
                ▼
           entities · post_entities · entity_edges

The post is returned to the bot in under a second. The memory pipeline finishes in the background within a few seconds. Every layer becomes searchable immediately.

Retrieval

The bot exposes four read paths into the memory layer, as Claude tools:

  • search_posts_semantic(query) — embeds the query, calls match_posts RPC, returns top-k with url, audio_url, image_urls, and similarity score
  • search_posts_fts(query) — Postgres textSearch on content_md and title, for exact keyword matching
  • search_by_metadata({ mood, topic, date_from, date_to }) — jsonb path filter on metadata->>mood and metadata->topics
  • get_entity(name) + get_entity_neighborhood(name) — walks the graph

All results are enriched by lib/search.js:enrichPosts() which batch-fetches linked media and post URLs so the bot can link them cleanly in replies.

Bot architecture

Telegram ─┐
          ├─▶ lib/adapters/{telegram, http}
HTTP POST ─┘              │
                          ▼
              lib/core/{intake, session, chat, tools, tool-handlers,
                         system-prompt}
                          │
                          ▼
              lib/{posts, search, memory, entities, media-mirror,
                    ghost-sync, embed, transcribe}
                          │
                          ▼
              Supabase Postgres + Storage + Ghost Admin API

Adapters normalise incoming messages to a common shape and hand off to the core. The core runs the Claude tool-use loop. Tools call into lib functions. Lib functions talk to Supabase and Ghost.

Adding a new input channel (email, iOS Shortcut, web form) is one new adapter file — the rest of the stack does not change.

Ghost integration

Ghost is not just a front-end. The bot integrates with Ghost's Admin API for:

  • Posts dual-write — every post create/update/delete from the bot hits Ghost so danialdaud.com stays in sync
  • Images upload (/ghost/api/admin/images/upload/) — feature images for photo posts, permanent /content/images/... URLs
  • Media upload (/ghost/api/admin/media/upload/ with fallback to /files/upload/) — audio files for voice notes, permanent /content/media/... or /content/files/... URLs
  • Pages (/ghost/api/admin/pages/) — static pages like this one and /memory
  • Webhooks (inbound) — Ghost pings a local endpoint on every post.added, post.edited, post.published, post.unpublished, post.deleted. The bot re-syncs the post from Ghost into Supabase and re-runs the memory pipeline. This closes the drift loop when a post is edited directly in Ghost admin.

What's open-source and what's not

The bot's code lives in a private git repo. The journal content lives in Supabase and Ghost. Neither is public.

The stack itself is made of open ingredients:

  • Ghost — MIT
  • Supabase — Apache 2.0 (self-hostable)
  • pgvector — PostgreSQL license
  • Node.js, PM2, nginx, certbot — all open
  • OpenAI text-embedding-3-small — proprietary, API-accessed
  • Anthropic Claude — proprietary, API-accessed

Total monthly cost for a personal journal with this architecture: roughly the cost of the DigitalOcean droplet (around $6–$12) plus whatever you spend on LLM and embedding calls. For the volume of a personal journal, LLM costs are pennies per month.


Want the human version of this, with less jargon and more about why it matters? → How this journal remembers