GitHub
Ingest local git repositories or repo archives — your code as searchable, embeddable items.
The github source ingests git repositories — either a folder on disk or a .tar.gz / .tgz archive. It's specialized for code corpora: each tracked file becomes an item, with language tagged and history preserved as commit metadata.
For batch-ingesting many repos at once (e.g. your entire ~/code/ folder), use the dedicated vault add-repos command.
Get the export
There's nothing to download. Either:
- Point at a folder on disk: the existing working tree.
- Or hand it a
.tar.gz/.tgzarchive of a repo.
Both work. The folder path doesn't have to be inside Personify's directory.
Register the export
Single repo from a folder:
vault add-export \
--source github \
--path ~/code/some-repo \
--account my-orgSingle repo from an archive:
vault add-export \
--source github \
--path ~/Downloads/some-repo.tar.gz \
--account my-orgBulk-register every repo in a folder:
vault add-repos --path ~/code --account my-org --ingest--account is your GitHub org or handle. Personal repos and work repos can share a vault — give them different --account values to keep them attributable.
What gets ingested
| Item | Notes |
|---|---|
| One repository item | Repo name, default branch, remote URL (if any), HEAD SHA. |
| One file item per tracked file | kind=code, body is the file contents, language tagged, path in metadata. |
| Commit history | Latest commit per file recorded as ts; full author/message stored on the repository item. |
| README / docs | First-class — searchable like any other file. |
| Untracked files | Skipped. Only files git knows about are ingested. |
Ingest it
vault ingest --source github$ vault ingest --source githubWalking ~/code/some-repo (HEAD: 9c3a...) ...Indexing 1,204 tracked files ...repositories: 1files: 1,204Run 16 completed in 6.8s
Caveats
- No remote API calls. Personify never talks to github.com. It reads from the local working tree (or extracts the archive). Issues, PRs, and Actions logs are not ingested by this source.
- Binary files are skipped for body extraction but their existence and path are recorded.
- Submodules are not recursed automatically — register each submodule path separately if you want them ingested.
- Large repos. A monorepo with 100k files is fine but ingest is single-pass; expect proportional disk and Postgres write load. Use
vault embed --limitto chunk the embedding step afterwards. scan-reposfirst. When pointingadd-reposat a folder you haven't curated, runvault scan-repos --path ~/codefirst to see what would be picked up — it's read-only.