Sources

GitHub

Ingest local git repositories or repo archives — your code as searchable, embeddable items.

The github source ingests git repositories — either a folder on disk or a .tar.gz / .tgz archive. It's specialized for code corpora: each tracked file becomes an item, with language tagged and history preserved as commit metadata.

For batch-ingesting many repos at once (e.g. your entire ~/code/ folder), use the dedicated vault add-repos command.

Get the export

There's nothing to download. Either:

  • Point at a folder on disk: the existing working tree.
  • Or hand it a .tar.gz / .tgz archive of a repo.

Both work. The folder path doesn't have to be inside Personify's directory.

Register the export

Single repo from a folder:

bash
vault add-export \
  --source github \
  --path ~/code/some-repo \
  --account my-org

Single repo from an archive:

bash
vault add-export \
  --source github \
  --path ~/Downloads/some-repo.tar.gz \
  --account my-org

Bulk-register every repo in a folder:

bash
vault add-repos --path ~/code --account my-org --ingest

--account is your GitHub org or handle. Personal repos and work repos can share a vault — give them different --account values to keep them attributable.

What gets ingested

ItemNotes
One repository itemRepo name, default branch, remote URL (if any), HEAD SHA.
One file item per tracked filekind=code, body is the file contents, language tagged, path in metadata.
Commit historyLatest commit per file recorded as ts; full author/message stored on the repository item.
README / docsFirst-class — searchable like any other file.
Untracked filesSkipped. Only files git knows about are ingested.

Ingest it

bash
vault ingest --source github
vault
$ vault ingest --source github
Walking ~/code/some-repo (HEAD: 9c3a...) ...
Indexing 1,204 tracked files ...
repositories: 1
files: 1,204
Run 16 completed in 6.8s

Caveats

  • No remote API calls. Personify never talks to github.com. It reads from the local working tree (or extracts the archive). Issues, PRs, and Actions logs are not ingested by this source.
  • Binary files are skipped for body extraction but their existence and path are recorded.
  • Submodules are not recursed automatically — register each submodule path separately if you want them ingested.
  • Large repos. A monorepo with 100k files is fine but ingest is single-pass; expect proportional disk and Postgres write load. Use vault embed --limit to chunk the embedding step afterwards.
  • scan-repos first. When pointing add-repos at a folder you haven't curated, run vault scan-repos --path ~/code first to see what would be picked up — it's read-only.