Graph-First Context Routing for LLM Agents: Replacing Vector Retrieval with Structured Knowledge Navigation

Current Situation Analysis

Personal knowledge bases and technical documentation vaults have largely converged on markdown-based systems with explicit cross-referencing (wikilinks, backlinks, or markdown references). When developers attempt to feed these vaults into LLM agents, the default architectural pattern remains vector Retrieval-Augmented Generation (RAG). This approach embeds documents into high-dimensional space, retrieves top-k chunks via cosine similarity, and injects them into the prompt.

The fundamental flaw is structural erosion. Vector RAG treats prose like an unstructured search index. It severs documents from their relational topology, discarding hierarchy, provenance, and explicit semantic pathways. For coding agents, this is unacceptable: a developer agent follows imports, dependencies, and type definitions. It navigates code as a graph. Personal knowledge bases are identical in structure, yet practitioners routinely flatten them into chunked embeddings, losing the very relationships that make the knowledge coherent.

This mismatch is overlooked because vector pipelines are commoditized. Frameworks abstract away embedding generation and chunking, making RAG the path of least resistance. However, the cost manifests in three measurable dimensions:

Provenance Loss: Chunks arrive without parent-child relationships or cross-reference context, causing hallucination when the model infers connections that don't exist.
Context Window Waste: Irrelevant but semantically similar fragments consume tokens that could carry structurally relevant nodes.
Synthesis Deficit: Cross-domain queries require traversing multiple conceptual clusters. Vector retrieval returns isolated fragments rather than connected reasoning paths.

Controlled evaluations consistently demonstrate this gap. When benchmarking vector RAG against graph-native navigation on a 99-note synthetic vault, baseline retrieval scores 2.067 on faithfulness and 2.133 on grounding (0–3 scale). The model struggles to maintain factual alignment and contextual grounding because it lacks the relational scaffolding that wikilinks provide. The industry has optimized for semantic similarity while ignoring topological fidelity.

WOW Moment: Key Findings

Replacing chunked retrieval with progressive graph traversal, augmented by lightweight embedding filtering, yields measurable gains across every evaluation dimension. The following table compares three approaches on identical synthesis tasks:

Approach	Faithfulness	Grounding	Insight Novelty	Answer Relevancy
Vector RAG (Baseline)	2.067	2.133	1.533	2.067
Pure Graph Traversal	2.000	2.533	2.333	2.400
Hybrid Graph (t=0.65 + Orphan k=5)	2.333	2.933	2.533	2.467

The hybrid variant outperforms both the baseline and pure traversal. The +0.80 grounding delta and +1.00 novelty jump reveal a critical insight: structure preserves reasoning chains, while embeddings recover missing links. Pure traversal occasionally follows tangential paths (causing a slight faithfulness dip), but hybrid filtering corrects this by pruning low-similarity edges and surfacing topically relevant orphans. The result is a context window that contains only structurally valid and semantically aligned notes, delivered in ~8 seconds per query.

This pattern aligns with broader agent architecture shifts. Anthropic explicitly deprecated vector RAG in Claude Code in favor of agentic search, noting that file-system navigation (Glob, Grep, Read) outperforms retrieval pipelines for code. Applying the same principle to prose vaults transforms static notes into a navigable knowledge graph.

Core Solution

The architecture replaces retrieval pipelines with a graph-native context engine. It operates in four coordinated phases: index generation, graph traversal, semantic filtering, and orphan recovery. All components are designed to run within an LLM agent's skill system without external ML dependencies.

Phase 1: Vault Index Generation

The system parses the markdown vault to extract wikilinks, count inbound mentions, and classify topical clusters. It generates a routing map (VAULT_INDEX.md) and per-section orientation files. Hub notes (high inbound mention count) become primary entry points.

interface VaultNode {
  id: string;
  title: string;
  inboundMentions: number;
  outboundLinks: string[];
  section: string;
}

class VaultIndexer {
  async generateIndex(vaultPath: string): Promise<Record<string, VaultNode>> {
    const nodes: Record<string, VaultNode> = {};
    const files = await this.readMarkdownFiles(vaultPath);
    
    for (const file of files) {
      const links = this.extractWikilinks(file.content);
      nodes[file.id] = {
        id: file.id,
        title: file.title,
        inboundMentions: 0,
        outboundLinks: links,
        section: this.detectSection(file.path)
      };
    }

    // Count inbound mentions
    for (const node of Object.values(nodes)) {
      for (const link of node.outboundLinks) {
        if (nodes[link]) nodes[link].inboundMentions++;
      }
    }
    return nodes;
  }
}

Phase 2: Graph Traversal Engine

The navigator follows outbound links iteratively, respecting depth limits and token budgets. Two modes exist: Depth (1–2 hops for directed tasks) and Synthesis (multi-hub traversal for cross-domain queries).

interface TraversalConfig {
  maxDepth: number;
  maxTokens: number;
  mode: 'depth' | 'synthesis';
}

class GraphNavigator {
  async traverse(
    entryNodeId: string,
    index: Record<string, VaultNode>,
    config: TraversalConfig
  ): Promise<string[]> {
    const visited = new Set<string>();
    const queue: { id: string; depth: number }[] = [{ id: entryNodeId, depth: 0 }];
    const loadedNotes: string[] = [];
    let currentTokens = 0;

    while (queue.length > 0) {
      const { id, depth } = queue.shift()!;
      if (visited.has(id) || depth > config.maxDepth) continue;
      
      const node = index[id];
      if (!node) continue;
      
      visited.add(id);
      const noteContent = await this.loadNote(id);
      currentTokens += this.estimateTokens(noteContent);
      
      if (currentTokens > config.maxTokens) break;
      loadedNotes.push(noteContent);

      if (config.mode === 'synthesis' || depth < config.maxDepth) {
        for (const link of node.outboundLinks) {
          if (!visited.has(link)) {
            queue.push({ id: link, depth: depth + 1 });
          }
        }
      }
    }
    return loadedNotes;
  }
}

Phase 3: Semantic Anchor Filtering

Instead of embedding queries at runtime, the system uses the entry note's pre-computed embedding as an anchor. Outbound links are scored against this anchor. Edges below a tunable threshold are pruned, preventing cross-domain contamination.

class EmbeddingFilter {
  async filterLinks(
    entryEmbedding: number[],
    candidates: string[],
    cache: Record<string, number[]>,
    threshold: number
  ): Promise<string[]> {
    const valid: string[] = [];
    for (const candidate of candidates) {
      const candidateEmbedding = cache[candidate];
      if (!candidateEmbedding) continue;
      
      const similarity = this.cosineSimilarity(entryEmbedding, candidateEmbedding);
      if (similarity >= threshold) {
        valid.push(candidate);
      }
    }
    return valid;
  }

  private cosineSimilarity(a: number[], b: number[]): number {
    const dot = a.reduce((sum, val, i) => sum + val * b[i], 0);
    const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
    const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
    return dot / (magA * magB);
  }
}

Phase 4: Orphan Context Recovery

After graph traversal, the system scans vault-wide embeddings for notes not yet loaded but semantically similar to the entry node. Up to k orphans are surfaced, filling gaps where explicit links don't exist.

class OrphanSurfacer {
  async recoverOrphans(
    entryEmbedding: number[],
    loadedIds: Set<string>,
    cache: Record<string, number[]>,
    k: number
  ): Promise<string[]> {
    const scored = Object.entries(cache)
      .filter(([id]) => !loadedIds.has(id))
      .map(([id, emb]) => ({
        id,
        score: this.cosineSimilarity(entryEmbedding, emb)
      }))
      .sort((a, b) => b.score - a.score)
      .slice(0, k);

    return scored.map(item => item.id);
  }
}

Architecture Rationale

Graph-first navigation preserves provenance and hierarchical context. The model receives complete notes with intact structure, not fragmented chunks.
Anchor-based filtering avoids runtime embedding generation. By using the entry note's embedding as a reference point, the system maintains semantic alignment without requiring a Python ML stack or external API calls.
Orphan surfacing compensates for incomplete wikilink graphs. Real-world vaults rarely have perfect connectivity; embeddings act as a safety net for unlinked but relevant content.
Token budgeting is enforced at the traversal layer. Hard caps prevent context overflow while maintaining deterministic latency (~8 seconds per query in production tests).

Pitfall Guide

1. Query-Agnostic Traversal

Explanation: Following wikilinks without semantic filtering causes the agent to wander into unrelated domains. A query about coursework might trigger traversal into client notes if a hub links to both. Fix: Implement anchor-based filtering or inject a lightweight query embedding at the entry point. Prune edges where cosine(entry, candidate) < threshold.

2. Token Budget Blowout

Explanation: Unbounded graph traversal quickly exceeds context windows, especially in synthesis mode. The model receives more tokens than it can process, degrading output quality. Fix: Enforce MAX_NOTES_PER_QUERY and MAX_TOKENS_PER_QUERY at the navigator layer. Use progressive loading: fetch top-k nodes first, then conditionally expand based on relevance scoring.

3. Over-Filtering Links

Explanation: Setting the similarity threshold too high (e.g., t=0.75) severs valid cross-domain connections. The graph becomes fragmented, and the model loses synthesis capability. Fix: Start with t=0.65. Compensate for pruned edges by enabling orphan surfacing (k=5). The combination of strict filtering + orphan recovery consistently outperforms permissive filtering.

4. Brittle Entry-Point Routing

Explanation: Initial routing heuristics rely on vault index metadata. On first contact with an unfamiliar vault, the system may select suboptimal entry nodes, causing poor initial context. Fix: Implement multi-candidate routing. Score top-3 hub candidates against the query, load all three, and let the model synthesize across them. Fallback to vault-wide search if confidence drops below threshold.

5. Embedding Cache Drift

Explanation: Pre-computed embeddings become stale when notes are edited, renamed, or deleted. The navigator may follow edges based on outdated semantic signals. Fix: Validate cache freshness on vault mount. Implement a lightweight hash check (md5 or sha256) on modified files. Trigger incremental cache regeneration only for changed nodes, not the entire vault.

6. Context Contamination

Explanation: Loading notes without provenance tagging causes the model to conflate sources. It may attribute a concept from Note A to Note B, breaking faithfulness. Fix: Wrap each loaded note in explicit source markers: . Instruct the agent to cite source IDs in responses. This maintains traceability and reduces hallucination.

7. Ignoring Note Hierarchy

Explanation: Flattening markdown into raw text strips headings, lists, and metadata. The model loses structural cues that guide reasoning. Fix: Preserve markdown formatting during load. Pass headings as structural anchors. Use a lightweight parser that extracts title, sections, and metadata before injection.

Production Bundle

Action Checklist

Generate vault index: Parse all markdown files, extract wikilinks, count inbound mentions, and output VAULT_INDEX.md
Configure traversal limits: Set MAX_NOTES_PER_QUERY (default 12) and MAX_TOKENS_PER_QUERY (default 8000)
Tune similarity threshold: Start at t=0.65; adjust based on cross-domain contamination reports
Enable orphan surfacing: Set ORPHAN_K=5 to recover unlinked but relevant notes
Validate embedding cache: Run hash check on vault mount; regenerate only modified node embeddings
Implement source tagging: Wrap loaded notes with  markers for traceability
Test routing heuristics: Run 10 synthesis queries; verify entry-point selection accuracy >80%
Monitor latency: Ensure per-query context assembly stays under 10 seconds; optimize cache reads if exceeded

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Small, well-linked vault (<200 notes)	Pure Graph Traversal	High connectivity eliminates need for embeddings; lowest latency	Near-zero compute overhead
Large, cross-domain vault (>500 notes)	Hybrid Graph (t=0.65 + Orphan k=5)	Embeddings recover missing links; filtering prevents context bloat	Moderate cache storage; ~8s/query latency
Legacy vault with zero wikilinks	Graph Builder + Hybrid	Mode 2 auto-links hub candidates; hybrid mode compensates for sparse graph	Initial indexing time; higher token usage during link generation
Strict compliance/audit requirements	Pure Graph + Source Tagging	Deterministic traversal with full provenance; no semantic ambiguity	Zero additional cost; requires manual link maintenance

Configuration Template

# knowledge-graph-config.yaml
vault:
  path: "./obsidian-vault"
  index_file: "VAULT_INDEX.md"
  cache_path: ".smart-env/multi"

traversal:
  mode: "hybrid" # depth | synthesis | hybrid
  max_depth: 2
  max_notes_per_query: 12
  max_tokens_per_query: 8000

filtering:
  threshold: 0.65
  orphan_k: 5
  anchor_strategy: "entry_note" # entry_note | query_injected

runtime:
  source_tagging: true
  cache_validation: "hash_check"
  latency_budget_ms: 10000

Quick Start Guide

Initialize the index: Run the vault parser against your markdown directory. The system generates VAULT_INDEX.md and per-section routing files. Verify hub detection accuracy by inspecting inbound mention counts.
Configure thresholds: Set threshold: 0.65 and orphan_k: 5 in the config. These values represent the Pareto-optimal balance between link pruning and context recovery.
Mount the embedding cache: Point the navigator to your plugin's embedding directory (e.g., .smart-env/multi/*.ajson). Run a hash validation to ensure cache freshness.
Execute a synthesis query: Route through vault-context in synthesis mode. The system loads the entry note, filters outbound edges, traverses valid links, and injects top-k orphans. Verify source tags in the output.
Iterate on graph density: Run /vault-discover Mode 2 weekly to suggest missing wikilinks. As your graph densifies, you can lower orphan_k and tighten the threshold without losing coverage.

Think with your second brain: a proper Claude Code harness for Obsidian