Sources
Files
Catch-all parser for folders or archives of plain documents — Markdown, text, PDF, JSON, CSV.
The files source is the everything-else parser. Point it at a folder or archive of documents and it ingests one item per file, with the body extracted from the most common formats.
It's the right source for personal notes you don't keep in Notion, miscellaneous PDFs, downloaded reference docs, and any one-off corpora that don't fit a more specific source.
Get the export
There's nothing to download. Either:
- Pass a folder path containing the files you want ingested, recursively.
- Or pass an archive (
.zip,.tar.gz,.tgz) — it will be extracted intostaging/first.
Register the export
Folder:
bash
vault add-export \
--source files \
--path ~/Documents/notes \
--account personalArchive:
bash
vault add-export \
--source files \
--path ~/Downloads/research-papers.zip \
--account researchWhat gets ingested
| File type | How it's parsed |
|---|---|
.md, .markdown | Body is the Markdown text. Front-matter (if YAML) lands in metadata. |
.txt | Body is the raw text. |
.pdf | Body is text extracted page-by-page. Page count recorded in metadata. |
.json, .jsonc | Body is the pretty-printed JSON; structure not normalized further. |
.csv | Body is the CSV (preserved verbatim); columns inferred from the header. |
| Other extensions | Skipped silently. Filenames still recorded as item_media. |
For each ingested file:
kind=documenttsis the file's mtime when no better timestamp is in the body.- The directory hierarchy is preserved as tags (
notes/journal/2025becomes three tags).
Ingest it
bash
vault ingest --source filesvault
$ vault ingest --source filesWalking ~/Documents/notes ...found 412 .md, 18 .pdf, 47 .txt, 6 .csvdocuments: 483Run 18 completed in 4.2s
Caveats
- PDF extraction is approximate. Scanned PDFs without an OCR layer come through as empty bodies. Run them through OCR first if you want them searchable.
- Large files are still ingested whole. A multi-MB JSON file will become a multi-MB item body — consider chunking very large files yourself before pointing the parser at them.
- No deduplication across sources. A document also ingested via
notionwon't be detected as a duplicate of the same file ingested viafiles— they have different source slugs and so different dedup keys. Pick one source per document. - Hidden files and
.git/directories are skipped. Symlinks are not followed.