Universal ingest
Drag any supported file onto a project. Chunky extracts text from PDFs, DOCX / DOC, PPTX / PPT, XLSX / XLS, CSV, Outlook MSG, Markdown, TXT, images (PNG, JPG, GIF, WEBP, BMP, SVG), and dozens of source-code formats.
Drop your PDFs, slide decks, spreadsheets, emails, screenshots, and code onto Chunky. Get a searchable knowledge graph — organised into projects and collections, indexed with hybrid FTS + semantic retrieval, and wired straight to Claude Desktop and Claude Code through an embedded MCP server.
Everything runs on your machine. No cloud database. No telemetry. No lock-in.
Drag any supported file onto a project. Chunky extracts text from PDFs, DOCX / DOC, PPTX / PPT, XLSX / XLS, CSV, Outlook MSG, Markdown, TXT, images (PNG, JPG, GIF, WEBP, BMP, SVG), and dozens of source-code formats.
Screenshots, photos of whiteboards, diagram captures — Chunky pipes raster images through an LLM vision pass, so the extracted text becomes searchable alongside the source pixels.
Assets auto-bucket by type (Documents, Slides, PDFs, Spreadsheets, Emails, Images, Code, Links). Create custom named collections and drag assets between them — the drag targets accept both files from disk and existing assets.
Every search fuses FTS5 lexical BM25 with sqlite-vec cosine similarity over BGE-small-en-v1.5 embeddings, weighted into a single ranked list. Cached summaries and keyPoints answer most questions without a follow-up fetch.
On every launch Chunky registers itself as an MCP server with Claude Desktop, Claude Desktop MSIX, and Claude Code CLI — writing to the right config path on each OS. Claude gets nine read-only tools for exploring your graph.
Open a PowerPoint or PDF and Chunky renders the extracted text and images together in the order they appeared. Every image keeps its OCR text attached so agents can reason about screenshots as first-class content.
Each project has its own chat session pre-scoped to that project's assets. Ask questions and the model calls Chunky's MCP tools with the right projectId already in context.
Data lives in your OS app-data directory. No sync, no telemetry, no analytics beacon. The only outbound network traffic is LLM API calls that you initiate (chat, OCR) and a one-time embedding model download on first source-build.
Chunky exposes its knowledge graph via Model Context Protocol (MCP) — the emerging standard for AI clients to discover and call external tools. Everything is read-only, so agents can freely explore without risk to your data.
search_nodes — hybrid FTS + semantic search across the graphget_node — read one node with byte paging for long bodiesget_nodes — bulk read up to 50 nodes in a single callget_neighbors — walk the edge graph 1–2 hops outlist_assets_in_project — enumerate a project's assetslist_nodes_by_type — filter by node type, optionally by projectlist_node_images — list images inside a node with OCR textget_image — fetch image bytes plus OCR, inline for vision clientssummarise_artifacts — LLM-summarise a set of nodesChunky writes the MCP entry into whichever of these it finds on launch:
~/.claude.json)
Tool names get pre-authorised in Claude Code's
permissions.allow so the agent doesn't prompt for
approval on every call.
Ask Claude:
"What slides in my product-strategy project mention Azure? Give me the exact wording and which deck each came from."
Claude calls search_nodes({query: "Azure", types:
["slides"]}), gets back ranked slide hits with title +
snippet + summary, and answers with citations — all without
Chunky ever sending your files anywhere.
Drop a year of customer-research transcripts, competitor slide decks, spec docs, and Miro exports into one Chunky project. Ask Claude "what did customers say about pricing?" — hybrid search surfaces the exact quotes with source attribution.
A shared drive full of Confluence exports and screenshots of legacy admin panels? OCR turns every image into indexed text. Search "click the export button in the admin UI" and Chunky finds the screenshots, not just prose that mentions them.
Import every case study, pitch deck, one-pager, and email template. Group them into collections by industry. Give Claude Code the MCP tool and let it draft account-specific proposals grounded in your actual collateral.
Ingest thousands of PDFs and Outlook MSG files under NDA on an air-gapped laptop. Nothing ever leaves the machine unless you explicitly send a snippet to an LLM. Perfect for regulated environments where cloud RAG is a non-starter.
Web bookmarks, PDFs from arXiv, meeting notes, photos of book margins — all under one roof, all searchable, all yours. Chunky replaces the Notion / Obsidian / bespoke-script stack most knowledge workers cobble together.
Ingest a project's docs, RFCs, ADRs, and source. Then let Claude Code query it through MCP while pair-programming: "before I write this middleware, what did we decide about error envelopes in the auth service?"
Free, MIT-licensed. Pick the installer for your OS. Currently v0.1.0-preview2 — a prerelease preview build. Not yet code-signed or notarized.
Windows 10 or 11, x64
Windows installer (.exe)NSIS installer, per-user install (no admin needed). WebView2 runtime auto-installed if missing. ~74 MB.
macOS 10.15 Catalina or newer
Universal binary (.dmg)Right-click → Open on first launch — build isn't notarized yet. ~95 MB (arch-specific) / 108 MB (universal).
Or build from source: detailed per-OS instructions on GitHub.