Productivity · Apr 1, 2026
How I Built My Knowledge Base
A deep dive into the process of creating a personal knowledge base to organize and access information efficiently.
A Self-Maintaining Personal Knowledge Base
Most people's experience with LLMs and documents looks like RAG: upload files, retrieve chunks at query time, generate an answer. It works, but nothing accumulates. Ask a subtle question that requires synthesising five documents and the LLM re-derives the answer from scratch every time.
I wanted something different, a wiki that gets smarter with every source I add, without me doing any of the maintenance work. Here's how I built it.
The Core Idea
The system has three components:
raw/: a drop zone for unprocessed sources (articles, notes, documentation, anything)wiki/: a structured, interlinked Markdown wiki maintained entirely by an LLM- A compile pipeline: when a new source lands in
raw/, a script hands it to a Claude agent that reads it, extracts knowledge, and integrates it into the wiki
The wiki is the artefact. It's not a search index or an embedding store. It's an actual document collection with cross-references, synthesis, and structure. Every time you add a source, the agent doesn't just index it. It reads the existing wiki, figures out what's new, and either updates existing articles or creates new ones.
Directory Structure
knowledge/personal/
├── raw/ # Drop sources here — never modify these
├── wiki/
│ ├── concepts/ # Atomic knowledge articles (one concept per file)
│ ├── connections/ # Cross-cutting synthesis (links 2+ concepts)
│ ├── index.md # Auto-maintained index of all articles
│ └── log.md # Chronological record of every operation
├── outputs/ # Generated reports and analyses (lint, queries)
├── scripts/
│ ├── compile.py # Integrates raw sources into the wiki
│ ├── lint.py # Health checks on the wiki
│ ├── log_writer.py # Shared logging utility
│ └── state.json # Per-source compile state (sha256 + timestamp)
├── CLAUDE.md # Schema and instructions for Claude
└── pyproject.toml # Python project (claude-agent-sdk)
The raw/ folder is the input queue. The wiki/ folder is the compiled output. The LLM writes all wiki articles.
Two Types of Wiki Articles
The wiki has a deliberate two-level structure that separates what things are from how they relate.
Concepts (wiki/concepts/)
One article per idea. Each file covers exactly one concept, explained clearly enough to stand alone.
---
title: "Spaced Repetition"
tags: [learning, memory]
sources:
- "raw/make-it-stick.md"
created: 2026-04-01
updated: 2026-04-03
---
# Spaced Repetition
Spaced repetition is a learning technique that schedules reviews at
increasing intervals based on how well you remember each item. It
exploits the spacing effect: memories are stronger when study sessions
are distributed over time rather than massed together.
## Key Points
- Review intervals expand as recall improves (hours → days → weeks → months)
- Forgetting slightly before a review strengthens long-term retention
- Implemented in tools like Anki, which uses the SM-2 algorithm
## Details
The theoretical basis is Ebbinghaus's forgetting curve (1885)...
## Related Concepts
- [[concepts/desirable-difficulty]] - Spaced repetition is one instance of this broader principle
- [[concepts/interleaving]] - Often combined with spaced repetition for stronger retention
## Related Connections
- [[connections/spaced-repetition-and-desirable-difficulty]]
## Sources
- [[raw/make-it-stick.md]] - Core explanation and research citations
The frontmatter fields are fixed: title, tags, sources, created, updated. created is set once and never changes. updated is refreshed on every edit. New sources are appended to the sources list. Existing entries are never removed.
Connections (wiki/connections/)
Cross-cutting synthesis articles that link two or more existing concepts and articulate a non-obvious relationship between them. The bar is deliberate: a connection page should only exist when linking the concepts reveals something that neither page says alone.
---
title: "Connection: Spaced Repetition and Desirable Difficulty"
connects:
- "concepts/spaced-repetition"
- "concepts/desirable-difficulty"
created: 2026-04-01
updated: 2026-04-01
---
# Connection: Spaced Repetition and Desirable Difficulty
## The Connection
Spaced repetition is the canonical implementation of desirable difficulty —
it works precisely _because_ retrieval is hard.
## Key Insight
Most people practise at the point where retrieval feels easy, which
produces fluency without durability. Spaced repetition forces you to
practise at the edge of forgetting, where retrieval is difficult and
therefore maximally consolidating. The mechanism of spaced repetition
_is_ the mechanism of desirable difficulty.
## Evidence
Studies by Roediger and Karpicke (2006) showed that retrieval practice
under conditions of high difficulty produced 50% better long-term
retention than re-reading...
## Related Concepts
- [[concepts/spaced-repetition]]
- [[concepts/desirable-difficulty]]
Connections don't carry their own sources field. They inherit context from the concept pages they bridge.
How the Compile Script Works
scripts/compile.py uses the Claude Agent SDK to run an agent with Read, Write, and Glob tools scoped to the wiki. The schema from CLAUDE.md is injected verbatim into the system prompt, so the prompt and the schema can never drift apart.
For each unprocessed file in raw/, the script kicks off one agent loop (up to 20 turns) with a single instruction: ingest this source into the wiki. From there, the agent does its own work:
- Reads
wiki/index.mdto see what already exists - Reads the raw source
- Pulls in whichever concept and connection pages it judges relevant
- Edits those pages: appends the source, integrates new claims, reconciles contradictions
- Creates new concept pages for genuinely new topics
- Creates new connection pages only when a non-obvious cross-concept relationship exists
- Regenerates
wiki/index.md
After the agent finishes, the script diffs a sha256 snapshot of the wiki taken before and after the run to determine which articles were created vs. updated, records the result in wiki/log.md, and marks the source as processed in scripts/state.json (with its sha256, so editing the raw file later marks it stale on the next lint).
Tool calls are restricted to paths inside wiki/concepts/, wiki/connections/, and wiki/index.md. The agent can't touch raw/, scripts/, or outputs/.
No API Key Required
The Agent SDK calls the locally installed claude CLI. If you already use Claude Code, you're already authenticated. No ANTHROPIC_API_KEY setup required.
The Lint System
scripts/lint.py runs six health checks on the wiki and writes a Markdown report to outputs/lint-YYYY-MM-DD.md.
| Severity | Check | What It Catches |
|---|---|---|
| Error | Broken links | [[wikilinks]] pointing to non-existent articles |
| Warning | Orphan pages | Articles with zero inbound links from other articles |
| Warning | Orphan sources | Raw files that haven't been compiled yet |
| Warning | Stale articles | Raw sources whose sha256 has changed since they were last compiled |
| Suggestion | Missing backlinks | A links to B, but B doesn't link back to A |
| Suggestion | Sparse articles | Below 200 words — likely incomplete |
The report uses error/warning/suggestion severity levels so you can triage at a glance. The script exits with code 1 if there are errors, making it CI-friendly.
Chronological Logging
Every compile, lint, and query operation appends an entry to wiki/log.md. This gives you a searchable record of how the wiki evolved.
## [2026-04-14T13:05:00] compile | how-does-claude-code-actually-work.md
- Source: raw/how-does-claude-code-actually-work.md
- Articles created: [[concepts/llm-tool-calling]], [[connections/harness-optimization-and-model-behavior]]
- Articles updated: [[concepts/ai-coding-harness]]
## [2026-04-14T14:42:44] lint | health check
- Errors: 0 | Warnings: 3 | Suggestions: 8
- Report: [[outputs/lint-2026-04-14]]
The log writer (scripts/log_writer.py) is a shared utility imported by both scripts. It also exposes append_query() for logging Q&A sessions, used by the query workflow described below.
Automation with Claude Code Hooks
Both scripts run automatically at the start of every Claude Code session. In .claude/settings.json:
{
"hooks": {
"SessionStart": [
{
"matcher": "startup",
"hooks": [
{
"type": "command",
"command": "cd /path/to/knowledge/personal && /opt/homebrew/bin/uv run python scripts/compile.py"
},
{
"type": "command",
"command": "cd /path/to/knowledge/personal && /opt/homebrew/bin/uv run python scripts/lint.py"
}
]
}
]
}
}
Hooks run with a minimal shell environment, so use the full path to uv (find it with which uv). Drop a file into raw/, open Claude Code, and the wiki is already updated and linted by the time the session loads.
CLAUDE.md: The Schema File
CLAUDE.md lives at the root of knowledge/personal/ and is the standing instruction for every compile call. It defines:
- The directory structure and what each folder is for
- The exact frontmatter and section format for concept and connection articles
- Rules for when to create vs. update articles
- Rules for when connection articles are warranted
- The query workflow (read index → synthesize → file good outputs back as new pages)
Ingesting Sources
I use Obsidian Web Clipper as the front door, a browser extension that saves any web page directly into the vault as a Markdown file. Since the vault root is knowledge/personal/, clipping a page lands a structured Markdown file in raw/.
The clipper template captures the page body alongside useful metadata:
{
"schemaVersion": "0.1.0",
"name": "Default",
"behavior": "create",
"noteContentFormat": "{{content}}",
"properties": [
{ "name": "title", "value": "{{title}}", "type": "text" },
{ "name": "source", "value": "{{url}}", "type": "text" },
{
"name": "author",
"value": "{{author|split:\", \"|wikilink|join}}",
"type": "multitext"
},
{ "name": "published", "value": "{{published}}", "type": "date" },
{ "name": "created", "value": "{{date}}", "type": "date" },
{ "name": "description", "value": "{{description}}", "type": "text" },
{ "name": "tags", "value": "clippings", "type": "multitext" }
],
"noteNameFormat": "{{title}}",
"path": "raw"
}
Setting path to raw puts clippings directly where the compile pipeline expects them. A clipped file ends up looking like:
---
title: "Spaced Repetition: How It Works"
source: "https://example.com/spaced-repetition"
author: [[Jane Smith]]
published: 2024-03-15
created: 2026-04-14
description: "An overview of spaced repetition systems and the science behind them."
tags:
- clippings
---
[Full article content in Markdown...]
The compile agent reads the entire file (frontmatter and body), so the source URL, author, and publication date are available when it writes the sources section of new concept articles. A clipped source becomes a proper citation rather than just a filename.
If you don't use Obsidian, anything that lands a .md or .txt file in raw/ works: drag-and-drop, curl, a different clipper, your own scripts.
The Day-to-Day Workflow
Adding knowledge. Clip or drop a file into raw/. Open Claude Code (or run uv run python scripts/compile.py manually). New concept and connection pages appear in wiki/, the index updates, and the log records what happened.
Querying. Open knowledge/personal/ as an Obsidian vault. The graph view shows the connection structure, backlinks show what references a concept, and full-text search finds specific claims. For deeper questions, ask Claude inside the vault. The schema instructs it to read wiki/index.md first, synthesize an answer with citations, and file the result back to the wiki as a new concept or connection page when the answer reveals something not already captured. That last step is what makes the wiki compound: every good question can leave behind a new article. Queries are logged via log_writer.append_query() alongside the pages consulted and any page filed.
Maintenance. The lint report flags what needs attention. Broken links are real errors. Orphan pages and missing backlinks are worth fixing when you have time. Sparse articles are candidates for adding more sources.
Try It Yourself
The design follows Andrej Karpathy's llm-wiki — his gist is the best starting point if you want to build your own version.
Install the tools first:
brew install claude # Claude Code — runs the compile agent
brew install uv # Python runner for the scripts
brew install --cask obsidian # Markdown editor for browsing your wiki
Install the Obsidian Web Clipper from your browser's extension store (Chrome · Firefox · Safari).
Then read the gist. It explains the folder structure to create, the CLAUDE.md schema file that tells the agent how to write articles, and the compile script that ties it together. The schema is the most important part — it's a plain text file you write once, and the agent follows it on every run.