Sources

Files

Catch-all parser for folders or archives of plain documents — Markdown, text, PDF, JSON, CSV.

The files source is the everything-else parser. Point it at a folder or archive of documents and it ingests one item per file, with the body extracted from the most common formats.

It's the right source for personal notes you don't keep in Notion, miscellaneous PDFs, downloaded reference docs, and any one-off corpora that don't fit a more specific source.

Get the export

There's nothing to download. Either:

Pass a folder path containing the files you want ingested, recursively.
Or pass an archive (.zip, .tar.gz, .tgz) — it will be extracted into staging/ first.

Register the export

Folder:

bash

vault add-export \
  --source files \
  --path ~/Documents/notes \
  --account personal

Archive:

bash

vault add-export \
  --source files \
  --path ~/Downloads/research-papers.zip \
  --account research

What gets ingested

File type	How it's parsed
`.md`, `.markdown`	Body is the Markdown text. Front-matter (if YAML) lands in metadata.
`.txt`	Body is the raw text.
`.pdf`	Body is text extracted page-by-page. Page count recorded in metadata.
`.json`, `.jsonc`	Body is the pretty-printed JSON; structure not normalized further.
`.csv`	Body is the CSV (preserved verbatim); columns inferred from the header.
Other extensions	Skipped silently. Filenames still recorded as `item_media`.

For each ingested file:

kind=document
ts is the file's mtime when no better timestamp is in the body.
The directory hierarchy is preserved as tags (notes/journal/2025 becomes three tags).

Ingest it

bash

vault ingest --source files

vault

$ vault ingest --source files
Walking ~/Documents/notes ...
  found 412 .md, 18 .pdf, 47 .txt, 6 .csv
  documents: 483
Run 18 completed in 4.2s

Caveats

PDF extraction is approximate. Scanned PDFs without an OCR layer come through as empty bodies. Run them through OCR first if you want them searchable.
Large files are still ingested whole. A multi-MB JSON file will become a multi-MB item body — consider chunking very large files yourself before pointing the parser at them.
No deduplication across sources. A document also ingested via notion won't be detected as a duplicate of the same file ingested via files — they have different source slugs and so different dedup keys. Pick one source per document.
Hidden files and .git/ directories are skipped. Symlinks are not followed.