Sources

GitHub

Ingest local git repositories or repo archives — your code as searchable, embeddable items.

The github source ingests git repositories — either a folder on disk or a .tar.gz / .tgz archive. It's specialized for code corpora: each tracked file becomes an item, with language tagged and history preserved as commit metadata.

For batch-ingesting many repos at once (e.g. your entire ~/code/ folder), use the dedicated vault add-repos command.

Get the export

There's nothing to download. Either:

Point at a folder on disk: the existing working tree.
Or hand it a .tar.gz / .tgz archive of a repo.

Both work. The folder path doesn't have to be inside Personify's directory.

Register the export

Single repo from a folder:

bash

vault add-export \
  --source github \
  --path ~/code/some-repo \
  --account my-org

Single repo from an archive:

bash

vault add-export \
  --source github \
  --path ~/Downloads/some-repo.tar.gz \
  --account my-org

Bulk-register every repo in a folder:

bash

vault add-repos --path ~/code --account my-org --ingest

--account is your GitHub org or handle. Personal repos and work repos can share a vault — give them different --account values to keep them attributable.

What gets ingested

Item	Notes
One repository item	Repo name, default branch, remote URL (if any), HEAD SHA.
One file item per tracked file	`kind=code`, body is the file contents, language tagged, path in metadata.
Commit history	Latest commit per file recorded as `ts`; full author/message stored on the repository item.
README / docs	First-class — searchable like any other file.
Untracked files	Skipped. Only files git knows about are ingested.

Ingest it

bash

vault ingest --source github

vault

$ vault ingest --source github
Walking ~/code/some-repo (HEAD: 9c3a...) ...
Indexing 1,204 tracked files ...
  repositories: 1
  files:        1,204
Run 16 completed in 6.8s

Caveats

No remote API calls. Personify never talks to github.com. It reads from the local working tree (or extracts the archive). Issues, PRs, and Actions logs are not ingested by this source.
Binary files are skipped for body extraction but their existence and path are recorded.
Submodules are not recursed automatically — register each submodule path separately if you want them ingested.
Large repos. A monorepo with 100k files is fine but ingest is single-pass; expect proportional disk and Postgres write load. Use vault embed --limit to chunk the embedding step afterwards.
scan-repos first. When pointing add-repos at a folder you haven't curated, run vault scan-repos --path ~/code first to see what would be picked up — it's read-only.