Your LLM Forgets Everything. Give It a Wiki!

By Codcompass Team·2026-05-27·8 min read

Beyond Context Windows: Architecting Self-Maintaining Agent Knowledge Stores

Current Situation Analysis

The fundamental limitation of modern LLM integrations isn't model capability; it's state persistence. Every new session begins with a blank slate. Engineering teams routinely paste project constraints, architectural decisions, and historical context into prompts, burning input tokens and developer time to re-establish baseline awareness. Once the session terminates, that accumulated context evaporates.

The industry has attempted to solve this through two primary vectors, both of which misunderstand the nature of machine memory:

Extended Context Windows: Pushing limits to 128K, 200K, or 1M tokens creates a larger temporary buffer, not persistent storage. Context compaction algorithms still truncate or summarize older turns, and cross-session continuity remains impossible. Larger windows also increase inference latency and input costs linearly.
Standard Retrieval-Augmented Generation (RAG): Vector stores ingest raw documents and retrieve semantically similar chunks at query time. This approach treats knowledge as static fragments. It does not synthesize, update relationships, or resolve contradictions. After months of operation, a RAG pipeline will still retrieve outdated architecture diagrams or deprecated API endpoints because the underlying index lacks a maintenance loop.

The core misunderstanding is equating retrieval with memory. Retrieval fetches what exists. Memory requires accumulation, correlation, and self-correction. Without a compilation layer, AI agents remain stateless query processors rather than institutional knowledge carriers.

WOW Moment: Key Findings

The shift from retrieval-based patterns to compilation-based architectures fundamentally changes how AI systems handle institutional knowledge. Instead of fetching isolated fragments on demand, the system incrementally builds and maintains a structured, interlinked knowledge graph. The model doesn't just read; it synthesizes, cross-references, and updates.

Approach	State Persistence	Cross-Reference Accuracy	Maintenance Overhead	Query Latency (Avg)
Extended Context Window	None (session-bound)	Low (linear attention decay)	High (manual prompt engineering)	1.2s - 3.5s
Standard RAG	Static (index snapshot)	Medium (semantic match only)	High (manual chunking/re-indexing)	0.8s - 1.5s
Compilation/Wiki Pattern	Persistent (LLM-maintained)	High (explicit linking & conflict resolution)	Near-zero (automated synthesis)	0.05s - 0.15s

This finding matters because it decouples knowledge accumulation from prompt engineering. The compilation pattern transforms the LLM from a passive retriever into an active archivist. When new documentation arrives, the system doesn't just embed it; it integrates facts into existing concept pages, flags contradictions, updates dependency maps, and maintains a chronological log. The knowledge base evolves organically, mirroring how human teams maintain technical wikis, but without the administrative decay that typically kills internal documentation.

Core Solution

The architecture rests on three distinct layers, each with strict boundaries and responsibilities.

Layer 1: Raw Ingestion (Immutable)

Raw sources—technical papers, meeting transcripts, PR descriptions, architecture diagrams—enter the system as read-only artifacts. The LLM never modifies these files. This preserves auditability and prevents model hallucination from corrupting source truth.

Layer 2: Synthesis Engine (LLM-Maintained)

This layer consists of structured markdown files representing entities, concepts, decisions, and summaries. The L

LM owns this space entirely. When a new source is ingested, the synthesis engine:

Extracts factual claims and maps them to existing pages
Creates new pages for novel concepts
Updates cross-references and dependency graphs
Flags contradictions with existing knowledge
Appends changes to a versioned log

Layer 3: Orchestration Schema (Instructional)

A single configuration file (e.g., AGENTS.md or SYSTEM.md) defines the conventions for ingestion, querying, and maintenance. It instructs the model on how to interact with the synthesis layer, enforce naming standards, and handle edge cases.

Implementation Architecture

The backend leverages SQLite with two specialized extensions: FTS5 for lexical keyword search and sqlite-vec for dense vector operations. Embeddings are generated locally using bge-base-en-v1.5, a 200MB model optimized for Apple Silicon and CPU inference. This eliminates API dependencies, rate limits, and data exfiltration risks.

Search operates as a hybrid pipeline combining BM25 (lexical) and cosine similarity (semantic) scores, fused via Reciprocal Rank Fusion (RRF). This balances exact term matching with conceptual relevance, significantly reducing false positives common in pure vector search.

TypeScript Integration Example

import { Database } from 'better-sqlite3';
import { createEmbedding } from './embedding-runtime';
import { KnowledgeSchema } from './schema-types';

export class KnowledgeSynthesisEngine {
  private db: Database;
  private schemaVersion: string;

  constructor(dbPath: string, schemaVersion: string) {
    this.db = new Database(dbPath);
    this.schemaVersion = schemaVersion;
    this.initializeTables();
  }

  private initializeTables(): void {
    this.db.exec(`
      CREATE TABLE IF NOT EXISTS knowledge_nodes (
        id TEXT PRIMARY KEY,
        title TEXT NOT NULL,
        content TEXT NOT NULL,
        embedding BLOB,
        last_updated INTEGER DEFAULT (strftime('%s', 'now')),
        version INTEGER DEFAULT 1
      );
      CREATE VIRTUAL TABLE IF NOT EXISTS node_search USING fts5(title, content);
      CREATE TRIGGER IF NOT EXISTS sync_search AFTER INSERT ON knowledge_nodes
      BEGIN
        INSERT INTO node_search(rowid, title, content) VALUES (new.id, new.title, new.content);
      END;
    `);
  }

  async ingestSource(rawContent: string, sourceMeta: Record<string, string>): Promise<void> {
    const synthesisPrompt = this.buildSynthesisPrompt(rawContent, sourceMeta);
    const structuredOutput = await this.invokeModel(synthesisPrompt);
    
    for (const node of structuredOutput.nodes) {
      const embedding = await createEmbedding(node.content);
      this.upsertNode(node.id, node.title, node.content, embedding);
    }
    
    this.logIngestion(rawContent, sourceMeta);
  }

  async queryKnowledge(query: string, topK: number = 5): Promise<KnowledgeSchema.Node[]> {
    const queryEmbedding = await createEmbedding(query);
    
    const lexicalResults = this.db.prepare(`
      SELECT id, title, content, rank 
      FROM node_search 
      WHERE node_search MATCH ? 
      ORDER BY rank LIMIT ?
    `).all(query, topK);

    const semanticResults = this.db.prepare(`
      SELECT id, title, content, 
             vec_distance_cosine(embedding, ?) AS similarity 
      FROM knowledge_nodes 
      ORDER BY similarity ASC LIMIT ?
    `).all(queryEmbedding, topK);

    return this.fuseResults(lexicalResults, semanticResults, topK);
  }

  private fuseResults(lexical: any[], semantic: any[], limit: number): KnowledgeSchema.Node[] {
    const rankMap = new Map<string, number>();
    
    lexical.forEach((item, idx) => {
      rankMap.set(item.id, (rankMap.get(item.id) || 0) + 1 / (idx + 1));
    });
    
    semantic.forEach((item, idx) => {
      rankMap.set(item.id, (rankMap.get(item.id) || 0) + 1 / (idx + 1));
    });

    return Array.from(rankMap.entries())
      .sort((a, b) => b[1] - a[1])
      .slice(0, limit)
      .map(([id]) => this.getNodeById(id));
  }

  private upsertNode(id: string, title: string, content: string, embedding: Buffer): void {
    const tx = this.db.transaction(() => {
      this.db.prepare(`
        INSERT INTO knowledge_nodes (id, title, content, embedding)
        VALUES (?, ?, ?, ?)
        ON CONFLICT(id) DO UPDATE SET
          content = excluded.content,
          embedding = excluded.embedding,
          last_updated = strftime('%s', 'now'),
          version = version + 1
      `).run(id, title, content, embedding);
    });
    tx();
  }
}

Architecture Rationale

Local-First Storage: SQLite eliminates network round-trips and external dependencies. The entire knowledge base is a single file, trivial to backup, version, or migrate.
Hybrid Search via RRF: Pure vector search struggles with exact identifiers (e.g., payments-v2, OAuth2.1). Pure lexical search misses conceptual matches. RRF mathematically balances both, improving precision without complex re-ranking models.
Markdown as Source of Truth: The SQLite index is derived from disk files. If the database corrupts, it rebuilds from markdown. This ensures compatibility with Obsidian, VS Code, and Git workflows.
Daemon Mode for Latency: Loading bge-base-en-v1.5 takes 2-3 seconds on cold start. Running a persistent background process warms the model in memory, reducing subsequent search latency to 50-150ms.

Pitfall Guide

1. Schema Drift

Explanation: Instruction files (AGENTS.md) become outdated as project conventions evolve. The LLM continues following deprecated formatting or ingestion rules, causing inconsistent wiki entries. Fix: Version-control the schema file. Implement a pre-commit hook that validates new wiki entries against the current schema version. Force schema updates through explicit model prompts rather than implicit behavior.

2. Contradiction Blindness

Explanation: LLMs tend to merge conflicting information rather than flagging it. Two architecture decisions from different quarters might coexist without warning, leading to implementation errors. Fix: Add explicit contradiction detection to the synthesis prompt. Require the model to output a conflicts array when new facts clash with existing nodes. Implement a human-in-the-loop review queue for flagged conflicts.

3. Cold-Start Embedding Latency

Explanation: Loading the embedding model on every CLI invocation adds 2-3 seconds of overhead. In rapid iteration workflows, this compounds into significant friction. Fix: Deploy a persistent daemon (kb serve --detached equivalent). Route all search/ingest calls through a local IPC or HTTP endpoint. Keep the model resident in memory during active development sessions.

4. Over-Indexing Raw Artifacts

Explanation: Ingesting every log file, debug output, or transient note bloats the knowledge base with noise. The synthesis layer wastes tokens processing irrelevant data. Fix: Implement a curation pipeline. Only ingest sources that pass a relevance filter (e.g., architecture docs, decision records, API specs). Use file-type allowlisting and size thresholds before triggering synthesis.

5. Vector Drift & Stale Embeddings

Explanation: As wiki pages are updated, their embeddings may not reflect the latest content if re-embedding is skipped for performance reasons. Search results gradually degrade. Fix: Tie embedding regeneration to the version column. Trigger async re-embedding on every UPDATE. Use a background worker pool to handle embedding jobs without blocking the main synthesis loop.

6. Git Merge Conflicts in Wiki Files

Explanation: Multiple agents or developers editing markdown files simultaneously creates merge conflicts. LLMs lack native conflict resolution strategies for structured text. Fix: Enforce atomic writes with file locking. Use a branch-per-feature workflow for knowledge updates. Implement a deterministic merge strategy that prioritizes the latest last_updated timestamp and preserves cross-reference integrity.

Production Bundle

Action Checklist

Initialize SQLite database with FTS5 and vector extensions
Configure bge-base-en-v1.5 model path and cache directory
Define orchestration schema with explicit ingestion and formatting rules
Implement hybrid search pipeline with RRF weighting parameters
Set up persistent daemon mode to eliminate cold-start latency
Create version-controlled wiki directory with Git ignore rules for SQLite artifacts
Establish contradiction detection and human-review workflow
Monitor embedding regeneration queue and token consumption metrics

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Rapid prototyping / local dev	Compilation/Wiki Pattern (Local SQLite)	Zero API costs, instant iteration, full data sovereignty	$0 infrastructure, minimal compute
Enterprise multi-agent orchestration	Hybrid (Wiki + Centralized Vector DB)	Wiki handles synthesis, vector DB enables cross-team search	Moderate cloud costs, high ROI on knowledge reuse
High-frequency real-time queries	Daemon-backed Hybrid Search	Warm model cache reduces latency to <150ms	Higher RAM usage, negligible CPU overhead
Compliance-heavy / regulated data	Local-First Compilation	No data leaves the machine, full audit trail via Git	Zero third-party risk, higher internal maintenance

Configuration Template

# AGENTS.md - Knowledge Synthesis Protocol

## Core Directives
1. You maintain a persistent wiki located in `./knowledge/`.
2. Raw sources are immutable. Never modify original documents.
3. Synthesize facts into entity pages, concept summaries, and decision logs.
4. Update cross-references when dependencies change.
5. Flag contradictions explicitly using the `[[CONFLICT]]` tag.

## File Structure
- `./knowledge/entities/` - System components, services, libraries
- `./knowledge/concepts/` - Architectural patterns, protocols, methodologies
- `./knowledge/decisions/` - ADRs, trade-off analyses, implementation choices
- `./knowledge/logs/` - Chronological synthesis history

## Search & Ingestion Rules
- Use `kb search <query>` to retrieve relevant context before answering.
- Use `kb add <source>` to ingest new documentation.
- Use `kb update <entity>` to modify existing pages.
- Always verify cross-references after updates.
- Maintain markdown formatting consistency.

Quick Start Guide

Initialize the knowledge store: Create a dedicated directory for markdown files and configure the SQLite backend with FTS5 and vector extensions. Set the embedding model path to your local cache.
Deploy the orchestration schema: Place AGENTS.md in your project root. Define naming conventions, file structure, and synthesis rules. Ensure your AI agent reads this file on initialization.
Start the embedding daemon: Launch the background service to warm the model in memory. Verify latency drops below 200ms for subsequent queries.
Ingest initial sources: Run the synthesis pipeline against your existing architecture docs, decision records, and API specifications. Verify cross-references and contradiction flags.
Validate agent behavior: Query the knowledge base through your agent. Confirm it autonomously calls search, update, and add operations without manual prompt engineering. Monitor synthesis logs for accuracy.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back