Architecture
How Personify stores, normalizes, and links your data — items, embeddings, the knowledge graph, and the on-disk vault.
Personify is a local-first personal data vault. The exports you give it, the normalized rows it derives, the embeddings, and the knowledge graph all live on your machine unless you deliberately put another service in front of them.
The shape of the system
MCP / FastAPI / CLI / UI
|
Knowledge graph
|
Items + text + embeddings
|
Raw exports on diskRaw exports are the source of truth. Everything above them is reproducible: if a parser improves, you can re-ingest from the stored raw export.
Docker and vault boundaries
The default local setup uses one Docker container named personify-db. Inside
that container, each vault gets its own Postgres database:
| Vault name | Postgres database | Filesystem root |
|---|---|---|
personal | personify | ./vault |
code-corpus | personify_code_corpus | ./vaults/code-corpus |
work | personify_work | ./vaults/work |
The postgres database is used briefly as an admin database when Personify
checks for or creates another vault database. Your vault data lives in the vault
database plus its matching filesystem root.
In the UI, New vault... creates the Postgres database, creates the
filesystem folders, initializes the schema, seeds parser sources, and optionally
switches the running app to the new vault. No manual createdb or
vault --vault NAME init is needed once npm start is running.
On-disk vault layout
The default personal vault uses ./vault/. Named vaults use
./vaults/<name>/.
vault/
raw/ # immutable original exports
staging/ # extracted/working copies during parsing
normalized/ # canonical JSON per item
manifests/ # per-export metadata
logs/ # ingestion logsWhen you register an export in the UI or with vault add-export, Personify
copies the file or folder into the active vault, records its SHA-256 hash, and
creates a raw_exports row. It never moves or edits your original file.
Core tables
Personify uses Postgres 17 with the pgvector extension. The schema has 15
application tables.
Items
items is the central table. One row is one ingested thing: a chat message, an
email, a file, a tweet, a repo file, and so on. Important fields include:
source_slug, account_handle, kind, native_id, content_hash, ts,
title, and structured metadata.
Two direct companion tables hang off items:
item_textstores the textual body used for full-text search and later embedding.item_mediastores media/attachment paths inside the active vault. HTTP and CLI retrieval go through a resolver that rejects paths outside the vault root.
Embeddings
embeddings stores embedded chunks with item_id, model, chunk_idx,
chunk_text, and vector. The default vector dimension is 384 for
sentence-transformers/all-MiniLM-L6-v2.
Embeddings are optional and populated after ingest with vault embed --limit N
or the UI's embedding controls.
Raw exports and runs
raw_exportsstores source slug, account handle, original path, stored vault path, size, hash, received time, and notes.ingestion_runsrecords parser execution status and counts:items_seen,items_inserted,items_skipped, errors, start, and finish time.pipeline_stagestracks optional per-export stages such as ingest, embed, and graph extraction.
Tags
tags is a free-form labeling table keyed directly to item_id. Parsers use it
for source-specific labels such as channel names, languages, folders, and other
metadata worth filtering on.
Knowledge graph
The graph is derived from items:
entitiesare nodes such as Person, Project, Company, Repository, Tool, or Topic.entity_aliasesstore alternate names and handles.relationshipsare typed directed edges.entity_evidenceandrelationship_evidenceground graph facts back to source items and quotes.
The graph is useful because it stays evidence-backed. Manual graph edits exist, but extractor-created evidence remains tied to items.
Dedup invariants
Re-ingest is designed to be safe:
| Table | Dedup key |
|---|---|
items | (source_slug, account_handle, native_id) when native_id is present, else (source_slug, account_handle, content_hash) |
raw_exports | (source_slug, account_handle, sha256) |
entities | (database_id, type, canonical_name) |
relationships | (source_entity_id, target_entity_id, relationship_type) |
Re-registering the same bytes is rejected. Re-running ingest is idempotent unless you explicitly replace an export.
Read paths
Four surfaces read the same service layer:
- UI — browse, search, register exports, run pipeline stages, and manage vaults.
- CLI — full local control with
vault ...commands. - FastAPI HTTP API — JSON endpoints for search, items, exports, graph, vaults, and MCP status.
- MCP server — read-only tools and resources for agents.
The stdio MCP server is launched with vault mcp. The UI can also toggle a
gated streamable HTTP MCP transport mounted at /mcp on the FastAPI server.
Where to go next
- Quickstart — run setup and ingest your first export.
- CLI reference — every command and option.
- API reference — the HTTP surface.
- MCP server — stdio and UI-gated HTTP MCP usage.