Sources

Gmail

Ingest your Gmail mailbox from a Google Takeout MBOX export.

The gmail source ingests an MBOX file from Google Takeout. Every email becomes one item with full headers, body (plain or HTML rendered to text), and attachments referenced.

Get the export

  1. Go to takeout.google.com.
  2. Click Deselect all, then check Mail.
  3. Choose Include all of your mail (or pick specific labels).
  4. Pick Export once, format .zip, and a reasonable file size cap (Google splits large mailboxes).
  5. Click Create export. Google emails you when it's ready (can take hours for large mailboxes).
  6. Unzip — you get one or more .mbox files (Mail-001.mbox, Mail-002.mbox, ...).

Register the export

Register each MBOX file separately (one add-export per file):

bash
vault add-export \
  --source gmail \
  --path ~/Downloads/Takeout/Mail/Mail-001.mbox \
  --account myname@gmail.com

--account is your Gmail address. If you have multiple Google accounts, ingest each one with its own --account value to keep them attributable.

What gets ingested

ItemNotes
One email item per messagekind=email, body text (HTML stripped to plaintext), all standard headers in metadata.
ThreadingIn-Reply-To and References headers preserved — used to reconstruct conversation membership.
LabelsGmail labels become tags.
AttachmentsFilename and MIME type recorded as item_media; the bytes stay in the MBOX in staging/.
Sent / Drafts / SpamAll folders are ingested — filter by tag at query time if you only want Inbox.

Ingest it

bash
vault ingest --source gmail
vault
$ vault ingest --source gmail
Reading Mail-001.mbox (4.2 GB) ...
Parsing 82,140 messages ...
emails: 82,140
threaded: 41,002
Run 13 completed in 7m12s

Caveats

  • Big mailboxes are slow. Parsing is CPU-bound on attachment extraction and HTML-to-text. A 10 GB mailbox can take 10+ minutes.
  • Charset edge cases. Old emails in obscure encodings sometimes round-trip with replacement characters. The original bytes are still in the MBOX in staging/.
  • No incremental sync. Gmail Takeout is a snapshot. Re-export and re-ingest periodically; dedup is by Message-ID header so you can re-add overlapping exports safely.
  • Calendar invites and Google Drive notifications show up as ordinary emails — filter by sender or subject if you want to exclude them.