Think with your second brain: a proper Claude Code harness for Obsidian
Graph-First Context Routing for LLM Agents: Replacing Vector Retrieval with Structured Knowledge Navigation
Current Situation Analysis
Personal knowledge bases and technical documentation vaults have largely converged on markdown-based systems with explicit cross-referencing (wikilinks, backlinks, or markdown references). When developers attempt to feed these vaults into LLM agents, the default architectural pattern remains vector Retrieval-Augmented Generation (RAG). This approach embeds documents into high-dimensional space, retrieves top-k chunks via cosine similarity, and injects them into the prompt.
The fundamental flaw is structural erosion. Vector RAG treats prose like an unstructured search index. It severs documents from their relational topology, discarding hierarchy, provenance, and explicit semantic pathways. For coding agents, this is unacceptable: a developer agent follows imports, dependencies, and type definitions. It navigates code as a graph. Personal knowledge bases are identical in structure, yet practitioners routinely flatten them into chunked embeddings, losing the very relationships that make the knowledge coherent.
This mismatch is overlooked because vector pipelines are commoditized. Frameworks abstract away embedding generation and chunking, making RAG the path of least resistance. However, the cost manifests in three measurable dimensions:
- Provenance Loss: Chunks arrive without parent-child relationships or cross-reference context, causing hallucination when the model infers connections that don't exist.
- Context Window Waste: Irrelevant but semantically similar fragments consume tokens that could carry structurally relevant nodes.
- Synthesis Deficit: Cross-domain queries require traversing multiple conceptual clusters. Vector retrieval returns isolated fragments rather than connected reasoning paths.
Controlled evaluations consistently demonstrate this gap. When benchmarking vector RAG against graph-native navigation on a 99-note synthetic vault, baseline retrieval scores 2.067 on faithfulness and 2.133 on grounding (0β3 scale). The model struggles to maintain factual alignment and contextual grounding because it lacks the relational scaffolding that wikilinks provide. The industry has optimized for semantic similarity while ignoring topological fidelity.
WOW Moment: Key Findings
Replacing chunked retrieval with progressive graph traversal, augmented by lightweight embedding filtering, yields measurable gains across every evaluation dimension. The following table compares three approaches on identical synthesis tasks:
| Approach | Faithfulness | Grounding | Insight Novelty | Answer Relevancy |
|---|---|---|---|---|
| Vector RAG (Baseline) | 2.067 | 2.133 | 1.533 | 2.067 |
| Pure Graph Traversal | 2.000 | 2.533 | 2.333 | 2.400 |
| Hybrid Graph (t=0.65 + Orphan k=5) | 2.333 | 2.933 | 2.533 | 2.467 |
The hybrid variant outperforms both the baseline and pure traversal. The +0.80 grounding delta and +1.00 novelty jump reveal a critical insight: structure preserves reasoning chains, while embeddings recover missing links. Pure traversal occasionally follows tangential paths (causing a slight faithfulness dip), but hybrid filtering corrects this by pruning low-similarity edges and surfacing topically relevant orphans. The result is a context window that contains only structurally valid and semantically aligned notes, delivered in ~8 seconds per query.
This pattern aligns with broader agent architecture shifts. Anthropic explicitly deprecated vector RAG in Claude Code in favor of agentic search, noting that file-system navigation (Glob, Grep, Read) outperforms retrieval pipelines for code. Applying the same principle to prose vaults transforms static notes into a navigable knowledge graph.
Core Solution
The architecture replaces retrieval pipelines with a graph-native context engine. It operates in four coordinated phases: index generation, graph traversal, semantic filtering, and orphan recovery. All components are designed to run within an LLM agent's skill system without external ML dependencies.
Phase 1: Vault Index Generation
The system parses the markdown vault to extract wikilinks, count inbound mentions, and classify topical clusters. It generates a routing map (VAULT_INDEX.md) and per-section orientation files. Hub notes (high inbound mention count) become primary entry points.
interface VaultNode {
id: string;
title: string;
inboundMentions: number;
outboundLinks: string[];
section: string;
}
class VaultIndexer {
async generateIndex(vaultPath: string): Promise<Record<string, VaultNode>> {
const nodes: Record<string, VaultNode> = {};
const files = await this.readMarkdownFiles(vaultPath);
for (const file of files) {
const links = this.extractWikilinks(file.content);
nodes[file.id] = {
id: file.id,
title: file.title,
inboundMentions: 0,
outboundLinks: links,
section: this.detectSection(file.path)
};
}
// Count inbound mentions
for (const node of Object.values(nodes)) {
for (const link of node.outboundLinks) {
if (nodes[link]) nodes[link].inboundMentions++;
}
}
return nodes;
}
}
Phase 2: Graph Traversal Engine
The navigator follows outbound links iteratively, respecting depth limits and token budgets. Two modes exist: Depth (1β2 hops for directed tasks) and Synthesis (multi-hub traversal for cross-domain queries).
interface TraversalConfig {
maxDepth: number;
maxTokens: number;
mode: 'depth' | 'synthesis';
}
class GraphNavigator {
async traverse(
entryNodeId: string,
index: Record<string, VaultNode>,
config: TraversalConfig
): Promise<string[]> {
const visited = new Set<string>();
const queue: { id: string; depth: number }[] = [{ id: entryNodeId, depth: 0 }];
const loadedNotes: string[] = [];
let currentTokens = 0;
while (queue.length > 0) {
const { id, depth } = queue.shift()!;
if (visited.has(id) || depth > config.maxDepth) continue;
const node = index[id];
if (!node) continue;
visited.add(id);
const noteContent = await this.loadNote(id);
currentTokens += this.estimateTokens(noteContent);
if (currentTokens > config.maxTokens) break;
loadedNotes.push(noteContent);
if (config.mode === 'synthesis' || depth < config.maxDepth) {
for (const link of node.outboundLinks) {
if (!visited.has(link)) {
queue.push({ id: link, depth: depth + 1 });
}
}
}
}
return loadedNotes;
}
}
Phase 3: Semantic Anchor Filtering
Instead of embedding queries at runtime, the system uses the entry note's pre-computed embedding as an anchor. Outbound links are scored against this anchor. Edges below a tunable threshold are pruned, preventing cross-domain contamination.
class EmbeddingFilter {
async filterLinks(
entryEmbedding: number[],
candidates: string[],
cache: Record<string, number[]>,
threshold: number
): Promise<string[]> {
const valid: string[] = [];
for (const candidate of candidates) {
const candidateEmbedding = cache[candidate];
if (!candidateEmbedding) continue;
const similarity = this.cosineSimilarity(entryEmbedding, candidateEmbedding);
if (similarity >= threshold) {
valid.push(candidate);
}
}
return valid;
}
private cosineSimilarity(a: number[], b: number[]): number {
const dot = a.reduce((sum, val, i) => sum + val * b[i], 0);
const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
return dot / (magA * magB);
}
}
Phase 4: Orphan Context Recovery
After graph traversal, the system scans vault-wide embeddings for notes not yet loaded but semantically similar to the entry node. Up to k orphans are surfaced, filling gaps where explicit links don't exist.
class OrphanSurfacer {
async recoverOrphans(
entryEmbedding: number[],
loadedIds: Set<string>,
cache: Record<string, number[]>,
k: number
): Promise<string[]> {
const scored = Object.entries(cache)
.filter(([id]) => !loadedIds.has(id))
.map(([id, emb]) => ({
id,
score: this.cosineSimilarity(entryEmbedding, emb)
}))
.sort((a, b) => b.score - a.score)
.slice(0, k);
return scored.map(item => item.id);
}
}
Architecture Rationale
- Graph-first navigation preserves provenance and hierarchical context. The model receives complete notes with intact structure, not fragmented chunks.
- Anchor-based filtering avoids runtime embedding generation. By using the entry note's embedding as a reference point, the system maintains semantic alignment without requiring a Python ML stack or external API calls.
- Orphan surfacing compensates for incomplete wikilink graphs. Real-world vaults rarely have perfect connectivity; embeddings act as a safety net for unlinked but relevant content.
- Token budgeting is enforced at the traversal layer. Hard caps prevent context overflow while maintaining deterministic latency (~8 seconds per query in production tests).
Pitfall Guide
1. Query-Agnostic Traversal
Explanation: Following wikilinks without semantic filtering causes the agent to wander into unrelated domains. A query about coursework might trigger traversal into client notes if a hub links to both.
Fix: Implement anchor-based filtering or inject a lightweight query embedding at the entry point. Prune edges where cosine(entry, candidate) < threshold.
2. Token Budget Blowout
Explanation: Unbounded graph traversal quickly exceeds context windows, especially in synthesis mode. The model receives more tokens than it can process, degrading output quality.
Fix: Enforce MAX_NOTES_PER_QUERY and MAX_TOKENS_PER_QUERY at the navigator layer. Use progressive loading: fetch top-k nodes first, then conditionally expand based on relevance scoring.
3. Over-Filtering Links
Explanation: Setting the similarity threshold too high (e.g., t=0.75) severs valid cross-domain connections. The graph becomes fragmented, and the model loses synthesis capability. Fix: Start with t=0.65. Compensate for pruned edges by enabling orphan surfacing (k=5). The combination of strict filtering + orphan recovery consistently outperforms permissive filtering.
4. Brittle Entry-Point Routing
Explanation: Initial routing heuristics rely on vault index metadata. On first contact with an unfamiliar vault, the system may select suboptimal entry nodes, causing poor initial context. Fix: Implement multi-candidate routing. Score top-3 hub candidates against the query, load all three, and let the model synthesize across them. Fallback to vault-wide search if confidence drops below threshold.
5. Embedding Cache Drift
Explanation: Pre-computed embeddings become stale when notes are edited, renamed, or deleted. The navigator may follow edges based on outdated semantic signals.
Fix: Validate cache freshness on vault mount. Implement a lightweight hash check (md5 or sha256) on modified files. Trigger incremental cache regeneration only for changed nodes, not the entire vault.
6. Context Contamination
Explanation: Loading notes without provenance tagging causes the model to conflate sources. It may attribute a concept from Note A to Note B, breaking faithfulness.
Fix: Wrap each loaded note in explicit source markers: <!-- SOURCE: [[NoteID]] -->. Instruct the agent to cite source IDs in responses. This maintains traceability and reduces hallucination.
7. Ignoring Note Hierarchy
Explanation: Flattening markdown into raw text strips headings, lists, and metadata. The model loses structural cues that guide reasoning. Fix: Preserve markdown formatting during load. Pass headings as structural anchors. Use a lightweight parser that extracts title, sections, and metadata before injection.
Production Bundle
Action Checklist
- Generate vault index: Parse all markdown files, extract wikilinks, count inbound mentions, and output
VAULT_INDEX.md - Configure traversal limits: Set
MAX_NOTES_PER_QUERY(default 12) andMAX_TOKENS_PER_QUERY(default 8000) - Tune similarity threshold: Start at t=0.65; adjust based on cross-domain contamination reports
- Enable orphan surfacing: Set
ORPHAN_K=5to recover unlinked but relevant notes - Validate embedding cache: Run hash check on vault mount; regenerate only modified node embeddings
- Implement source tagging: Wrap loaded notes with
<!-- SOURCE: [[ID]] -->markers for traceability - Test routing heuristics: Run 10 synthesis queries; verify entry-point selection accuracy >80%
- Monitor latency: Ensure per-query context assembly stays under 10 seconds; optimize cache reads if exceeded
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Small, well-linked vault (<200 notes) | Pure Graph Traversal | High connectivity eliminates need for embeddings; lowest latency | Near-zero compute overhead |
| Large, cross-domain vault (>500 notes) | Hybrid Graph (t=0.65 + Orphan k=5) | Embeddings recover missing links; filtering prevents context bloat | Moderate cache storage; ~8s/query latency |
| Legacy vault with zero wikilinks | Graph Builder + Hybrid | Mode 2 auto-links hub candidates; hybrid mode compensates for sparse graph | Initial indexing time; higher token usage during link generation |
| Strict compliance/audit requirements | Pure Graph + Source Tagging | Deterministic traversal with full provenance; no semantic ambiguity | Zero additional cost; requires manual link maintenance |
Configuration Template
# knowledge-graph-config.yaml
vault:
path: "./obsidian-vault"
index_file: "VAULT_INDEX.md"
cache_path: ".smart-env/multi"
traversal:
mode: "hybrid" # depth | synthesis | hybrid
max_depth: 2
max_notes_per_query: 12
max_tokens_per_query: 8000
filtering:
threshold: 0.65
orphan_k: 5
anchor_strategy: "entry_note" # entry_note | query_injected
runtime:
source_tagging: true
cache_validation: "hash_check"
latency_budget_ms: 10000
Quick Start Guide
- Initialize the index: Run the vault parser against your markdown directory. The system generates
VAULT_INDEX.mdand per-section routing files. Verify hub detection accuracy by inspecting inbound mention counts. - Configure thresholds: Set
threshold: 0.65andorphan_k: 5in the config. These values represent the Pareto-optimal balance between link pruning and context recovery. - Mount the embedding cache: Point the navigator to your plugin's embedding directory (e.g.,
.smart-env/multi/*.ajson). Run a hash validation to ensure cache freshness. - Execute a synthesis query: Route through
vault-contextin synthesis mode. The system loads the entry note, filters outbound edges, traverses valid links, and injects top-k orphans. Verify source tags in the output. - Iterate on graph density: Run
/vault-discoverMode 2 weekly to suggest missing wikilinks. As your graph densifies, you can lowerorphan_kand tighten the threshold without losing coverage.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
