Chunky is a local desktop app that turns a folder full of documents, slides, spreadsheets, emails, and images into a searchable knowledge graph, and exposes that graph to Claude through an MCP server. Everything runs on your machine — no cloud database, no server, no telemetry.

How does Chunky work with Claude?

On every launch, Chunky writes an entry into Claude Desktop and Claude Code CLI configs registering itself as an MCP server. Claude then has read-only access to nine tools: search_nodes, get_node, get_nodes, get_neighbors, list_assets_in_project, list_nodes_by_type, list_node_images, get_image, and summarise_artifacts.

What data leaves my machine?

The knowledge graph, the SQLite index, the embedding model, and all extracted text stay entirely on your machine. The only outbound traffic is (a) LLM API calls when you use the chat feature or when Chunky's OCR pass sends a raster image to Claude for text extraction, and (b) the one-time model download from HuggingFace on first launch of the source build. There is no telemetry.

What file formats does Chunky ingest?

PDF, DOCX and DOC (Word), PPTX and PPT (PowerPoint), XLSX and XLS (Excel), CSV, MSG (Outlook), Markdown, plain text, images (PNG, JPG, GIF, WEBP, BMP, SVG), and dozens of source-code extensions.

Which operating systems does Chunky support?

Windows 10 and 11, macOS 10.15 Catalina or newer (Apple silicon and Intel), and Linux (Ubuntu 22.04, Fedora 40, and derivatives). Installers are provided as NSIS setup, DMG, AppImage, .deb, and .rpm.

How does Chunky's retrieval work?

Chunky combines lexical search (SQLite FTS5 with BM25 scoring) and semantic search (sqlite-vec cosine similarity over 384-dimensional BGE-small-en-v1.5 embeddings), fusing the two into a single ranked result. Every node's summary and keyPoints are cached so search results can often answer a question without a follow-up fetch.

Chunky · Local RAG for your files, wired to Claude via MCP

Local RAG for your files. Claude reads it via MCP.

Drop your PDFs, slide decks, spreadsheets, emails, screenshots, and code onto Chunky. Get a searchable knowledge graph — organised into projects and collections, indexed with hybrid FTS + semantic retrieval, and wired straight to Claude Desktop and Claude Code through an embedded MCP server.

Everything runs on your machine. No cloud database. No telemetry. No lock-in.

MIT-licensed

Windows · macOS · Linux

Built on Tauri 2 + Rust

What Chunky does

Universal ingest

Drag any supported file onto a project. Chunky extracts text from PDFs, DOCX / DOC, PPTX / PPT, XLSX / XLS, CSV, Outlook MSG, Markdown, TXT, images (PNG, JPG, GIF, WEBP, BMP, SVG), and dozens of source-code formats.

OCR for images

Screenshots, photos of whiteboards, diagram captures — Chunky pipes raster images through an LLM vision pass, so the extracted text becomes searchable alongside the source pixels.

Projects and collections

Assets auto-bucket by type (Documents, Slides, PDFs, Spreadsheets, Emails, Images, Code, Links). Create custom named collections and drag assets between them — the drag targets accept both files from disk and existing assets.

Hybrid retrieval

Every search fuses FTS5 lexical BM25 with sqlite-vec cosine similarity over BGE-small-en-v1.5 embeddings, weighted into a single ranked list. Cached summaries and keyPoints answer most questions without a follow-up fetch.

MCP server, auto-configured

On every launch Chunky registers itself as an MCP server with Claude Desktop, Claude Desktop MSIX, and Claude Code CLI — writing to the right config path on each OS. Claude gets nine read-only tools for exploring your graph.

Inline document rendering

Open a PowerPoint or PDF and Chunky renders the extracted text and images together in the order they appeared. Every image keeps its OCR text attached so agents can reason about screenshots as first-class content.

Per-project chat

Each project has its own chat session pre-scoped to that project's assets. Ask questions and the model calls Chunky's MCP tools with the right projectId already in context.

Local by default

Data lives in your OS app-data directory. No sync, no telemetry, no analytics beacon. The only outbound network traffic is LLM API calls that you initiate (chat, OCR) and a one-time embedding model download on first source-build.

Built for agent access

Chunky exposes its knowledge graph via Model Context Protocol (MCP) — the emerging standard for AI clients to discover and call external tools. Everything is read-only, so agents can freely explore without risk to your data.

Nine read-only tools

search_nodes — hybrid FTS + semantic search across the graph
get_node — read one node with byte paging for long bodies
get_nodes — bulk read up to 50 nodes in a single call
get_neighbors — walk the edge graph 1–2 hops out
list_assets_in_project — enumerate a project's assets
list_nodes_by_type — filter by node type, optionally by project
list_node_images — list images inside a node with OCR text
get_image — fetch image bytes plus OCR, inline for vision clients
summarise_artifacts — LLM-summarise a set of nodes

Zero-config integration

Chunky writes the MCP entry into whichever of these it finds on launch:

Claude Desktop (Windows standard install)
Claude Desktop MSIX sandbox (Windows Store install)
Claude Desktop (macOS)
Claude Desktop (Linux)
Claude Code CLI (any OS with ~/.claude.json)

Tool names get pre-authorised in Claude Code's permissions.allow so the agent doesn't prompt for approval on every call.

Example agent invocation

Ask Claude:

"What slides in my product-strategy project mention Azure? Give me the exact wording and which deck each came from."

Claude calls search_nodes({query: "Azure", types: ["slides"]}), gets back ranked slide hits with title + snippet + summary, and answers with citations — all without Chunky ever sending your files anywhere.

Example use cases

Product / research knowledge base

Drop a year of customer-research transcripts, competitor slide decks, spec docs, and Miro exports into one Chunky project. Ask Claude "what did customers say about pricing?" — hybrid search surfaces the exact quotes with source attribution.

Screenshot-heavy documentation

A shared drive full of Confluence exports and screenshots of legacy admin panels? OCR turns every image into indexed text. Search "click the export button in the admin UI" and Chunky finds the screenshots, not just prose that mentions them.

Sales enablement collateral

Import every case study, pitch deck, one-pager, and email template. Group them into collections by industry. Give Claude Code the MCP tool and let it draft account-specific proposals grounded in your actual collateral.

Legal / compliance discovery

Ingest thousands of PDFs and Outlook MSG files under NDA on an air-gapped laptop. Nothing ever leaves the machine unless you explicitly send a snippet to an LLM. Perfect for regulated environments where cloud RAG is a non-starter.

Personal second brain

Web bookmarks, PDFs from arXiv, meeting notes, photos of book margins — all under one roof, all searchable, all yours. Chunky replaces the Notion / Obsidian / bespoke-script stack most knowledge workers cobble together.

Codebase companion

Ingest a project's docs, RFCs, ADRs, and source. Then let Claude Code query it through MCP while pair-programming: "before I write this middleware, what did we decide about error envelopes in the auth service?"

Download Chunky

Free, MIT-licensed. Pick the installer for your OS. Currently v0.1.0-preview2 — a prerelease preview build. Not yet code-signed or notarized.

Windows

Windows 10 or 11, x64

Windows installer (.exe)

NSIS installer, per-user install (no admin needed). WebView2 runtime auto-installed if missing. ~74 MB.

macOS

macOS 10.15 Catalina or newer

Apple silicon (.dmg) Intel (.dmg)

Universal binary (.dmg)

Right-click → Open on first launch — build isn't notarized yet. ~95 MB (arch-specific) / 108 MB (universal).

Linux

Ubuntu 22.04+, Fedora 40+, or equivalent

AppImage .deb .rpm

AppImage needs no install (~169 MB). .deb/.rpm pull WebKit + GTK deps automatically (~91 MB).