Capabilities

Architecture

How Personify stores, normalizes, and links your data — items, embeddings, the knowledge graph, and the on-disk vault.

Personify is a local-first personal data vault. The exports you give it, the normalized rows it derives, the embeddings, and the knowledge graph all live on your machine unless you deliberately put another service in front of them.

The shape of the system

text
MCP / FastAPI / CLI / UI
        |
Knowledge graph
        |
Items + text + embeddings
        |
Raw exports on disk

Raw exports are the source of truth. Everything above them is reproducible: if a parser improves, you can re-ingest from the stored raw export.

Docker and vault boundaries

The default local setup uses one Docker container named personify-db. Inside that container, each vault gets its own Postgres database:

Vault namePostgres databaseFilesystem root
personalpersonify./vault
code-corpuspersonify_code_corpus./vaults/code-corpus
workpersonify_work./vaults/work

The postgres database is used briefly as an admin database when Personify checks for or creates another vault database. Your vault data lives in the vault database plus its matching filesystem root.

In the UI, New vault... creates the Postgres database, creates the filesystem folders, initializes the schema, seeds parser sources, and optionally switches the running app to the new vault. No manual createdb or vault --vault NAME init is needed once npm start is running.

On-disk vault layout

The default personal vault uses ./vault/. Named vaults use ./vaults/<name>/.

text
vault/
  raw/         # immutable original exports
  staging/     # extracted/working copies during parsing
  normalized/  # canonical JSON per item
  manifests/   # per-export metadata
  logs/        # ingestion logs

When you register an export in the UI or with vault add-export, Personify copies the file or folder into the active vault, records its SHA-256 hash, and creates a raw_exports row. It never moves or edits your original file.

Core tables

Personify uses Postgres 17 with the pgvector extension. The schema has 15 application tables.

Items

items is the central table. One row is one ingested thing: a chat message, an email, a file, a tweet, a repo file, and so on. Important fields include: source_slug, account_handle, kind, native_id, content_hash, ts, title, and structured metadata.

Two direct companion tables hang off items:

  • item_text stores the textual body used for full-text search and later embedding.
  • item_media stores media/attachment paths inside the active vault. HTTP and CLI retrieval go through a resolver that rejects paths outside the vault root.

Embeddings

embeddings stores embedded chunks with item_id, model, chunk_idx, chunk_text, and vector. The default vector dimension is 384 for sentence-transformers/all-MiniLM-L6-v2.

Embeddings are optional and populated after ingest with vault embed --limit N or the UI's embedding controls.

Raw exports and runs

  • raw_exports stores source slug, account handle, original path, stored vault path, size, hash, received time, and notes.
  • ingestion_runs records parser execution status and counts: items_seen, items_inserted, items_skipped, errors, start, and finish time.
  • pipeline_stages tracks optional per-export stages such as ingest, embed, and graph extraction.

Tags

tags is a free-form labeling table keyed directly to item_id. Parsers use it for source-specific labels such as channel names, languages, folders, and other metadata worth filtering on.

Knowledge graph

The graph is derived from items:

  • entities are nodes such as Person, Project, Company, Repository, Tool, or Topic.
  • entity_aliases store alternate names and handles.
  • relationships are typed directed edges.
  • entity_evidence and relationship_evidence ground graph facts back to source items and quotes.

The graph is useful because it stays evidence-backed. Manual graph edits exist, but extractor-created evidence remains tied to items.

Dedup invariants

Re-ingest is designed to be safe:

TableDedup key
items(source_slug, account_handle, native_id) when native_id is present, else (source_slug, account_handle, content_hash)
raw_exports(source_slug, account_handle, sha256)
entities(database_id, type, canonical_name)
relationships(source_entity_id, target_entity_id, relationship_type)

Re-registering the same bytes is rejected. Re-running ingest is idempotent unless you explicitly replace an export.

Read paths

Four surfaces read the same service layer:

  1. UI — browse, search, register exports, run pipeline stages, and manage vaults.
  2. CLI — full local control with vault ... commands.
  3. FastAPI HTTP API — JSON endpoints for search, items, exports, graph, vaults, and MCP status.
  4. MCP server — read-only tools and resources for agents.

The stdio MCP server is launched with vault mcp. The UI can also toggle a gated streamable HTTP MCP transport mounted at /mcp on the FastAPI server.

Where to go next