Sources
Gmail
Ingest your Gmail mailbox from a Google Takeout MBOX export.
The gmail source ingests an MBOX file from Google Takeout. Every email becomes one item with full headers, body (plain or HTML rendered to text), and attachments referenced.
Get the export
- Go to takeout.google.com.
- Click Deselect all, then check Mail.
- Choose Include all of your mail (or pick specific labels).
- Pick Export once, format
.zip, and a reasonable file size cap (Google splits large mailboxes). - Click Create export. Google emails you when it's ready (can take hours for large mailboxes).
- Unzip — you get one or more
.mboxfiles (Mail-001.mbox,Mail-002.mbox, ...).
Register the export
Register each MBOX file separately (one add-export per file):
bash
vault add-export \
--source gmail \
--path ~/Downloads/Takeout/Mail/Mail-001.mbox \
--account myname@gmail.com--account is your Gmail address. If you have multiple Google accounts, ingest each one with its own --account value to keep them attributable.
What gets ingested
| Item | Notes |
|---|---|
| One email item per message | kind=email, body text (HTML stripped to plaintext), all standard headers in metadata. |
| Threading | In-Reply-To and References headers preserved — used to reconstruct conversation membership. |
| Labels | Gmail labels become tags. |
| Attachments | Filename and MIME type recorded as item_media; the bytes stay in the MBOX in staging/. |
| Sent / Drafts / Spam | All folders are ingested — filter by tag at query time if you only want Inbox. |
Ingest it
bash
vault ingest --source gmailvault
$ vault ingest --source gmailReading Mail-001.mbox (4.2 GB) ...Parsing 82,140 messages ...emails: 82,140threaded: 41,002Run 13 completed in 7m12s
Caveats
- Big mailboxes are slow. Parsing is CPU-bound on attachment extraction and HTML-to-text. A 10 GB mailbox can take 10+ minutes.
- Charset edge cases. Old emails in obscure encodings sometimes round-trip with replacement characters. The original bytes are still in the MBOX in
staging/. - No incremental sync. Gmail Takeout is a snapshot. Re-export and re-ingest periodically; dedup is by
Message-IDheader so you can re-add overlapping exports safely. - Calendar invites and Google Drive notifications show up as ordinary emails — filter by sender or subject if you want to exclude them.