Turning Obsidian into AI's Own Memory — Local Cognitive OS with Hindsight and Hermes

By Codcompass Team·2026-05-23·8 min read

Building a Self-Auditing Local Memory Engine with Recursive Summarization and Vector Persistence

Current Situation Analysis

Modern AI development faces a structural bottleneck: context fragmentation. Developers and knowledge workers routinely interact with large language models, but those interactions evaporate once the session closes. Traditional Retrieval-Augmented Generation (RAG) attempts to solve this by indexing documents and fetching relevant chunks on demand. While effective for static knowledge bases, RAG treats memory as a lookup table rather than a living system. It retrieves, but it does not consolidate. It answers, but it does not reflect.

This limitation is frequently overlooked because the industry optimizes for query latency and retrieval accuracy, not cognitive continuity. AI models lack innate long-term memory; they require external scaffolding to maintain context across days, weeks, or months. Without a mechanism to compress, cross-reference, and audit past outputs, context degrades into noise. The result is a system that appears intelligent in isolation but fails to accumulate wisdom over time.

The breakthrough lies in shifting from retrieval to consolidation. By running inference entirely on local hardware, developers can close the data loop. Benchmarks show that running Gemma3 via Ollama on consumer-grade hardware sustains approximately 23.4 tokens per second. This throughput is not merely acceptable; it is operationally viable for recursive summarization pipelines. At this speed, a system can periodically ingest raw interaction logs, compress them into structured insights, store embeddings in a local database, and cross-reference new inputs against historical summaries without ever transmitting data externally. This transforms AI from a transient query engine into a persistent cognitive layer that audits itself, detects contradictions, and evolves understanding organically.

WOW Moment: Key Findings

The transition from static RAG to recursive local memory fundamentally changes how AI systems handle context. The following comparison highlights the architectural and operational differences:

Approach	Context Evolution	Privacy Boundary	Self-Correction Capability	Hardware Dependency
Traditional Cloud RAG	Static chunk retrieval	External API exposure	None (relies on prompt engineering)	Cloud compute
Local Vector RAG	Session-bound retrieval	Fully local	Limited (similarity thresholds only)	Local GPU/CPU
Recursive Local Memory	Self-consolidating knowledge graph	Zero exfiltration	Active contradiction detection & theme tracking	Local GPU/CPU

This finding matters because it proves that memory consolidation is computationally feasible without cloud dependency. Recursive summarization allows the system to generate meta-knowledge: summaries of summaries that surface long-term patterns, track hypothesis evolution, and flag inconsistencies across time. Instead of manually maintaining a knowledge base, developers create an environment where raw dialogue naturally matures into structured insight. The infrastructure begins to audit itself, reducing cognitive load and enabling human-AI collaboration that compounds over time.

Core Solution

Building a self-auditing local memory engine requires four coordinated layers: session ingestion, local summarization, vector persistence, and knowledge synchronization. The architect

ure prioritizes data locality, recursive compression, and explicit contradiction tracking.

Step 1: Session Ingestion & Structuring

Raw interactions must be captured in a deterministic format. Instead of appending to a single file, each session is written to a timestamped directory with explicit metadata headers. This enables batch processing and prevents race conditions during concurrent writes.

// session-logger.ts
import { writeFileSync, mkdirSync, existsSync } from 'fs';
import { join } from 'path';

interface SessionEntry {
  timestamp: string;
  role: 'user' | 'assistant' | 'system';
  content: string;
  tags?: string[];
}

export class SessionIngestor {
  private vaultPath: string;
  private logDir: string;

  constructor(vaultRoot: string) {
    this.vaultPath = vaultRoot;
    this.logDir = join(vaultRoot, 'session-logs');
    if (!existsSync(this.logDir)) mkdirSync(this.logDir, { recursive: true });
  }

  async appendEntry(entry: SessionEntry): Promise<void> {
    const dateStr = new Date().toISOString().split('T')[0];
    const filePath = join(this.logDir, `${dateStr}.md`);
    const frontmatter = `---\ntype: session-log\ndate: ${dateStr}\n---\n\n`;
    const block = `### ${entry.role.toUpperCase()} [${entry.timestamp}]\n${entry.content}\n`;
    
    const content = existsSync(filePath) ? block : frontmatter + block;
    writeFileSync(filePath, content, { flag: existsSync(filePath) ? 'a' : 'w' });
  }
}

Step 2: Local Summarization Pipeline

A dedicated consolidation engine periodically scans the ingestion directory. It routes raw logs to a local LLM via Ollama, requesting structured extraction rather than free-form summarization. The prompt enforces topic clustering, confidence scoring, and explicit contradiction flags.

// cortex-engine.ts
import { Ollama } from 'ollama';

export class CortexEngine {
  private client: Ollama;
  private model: string;

  constructor(modelName: string = 'gemma3') {
    this.client = new Ollama();
    this.model = modelName;
  }

  async consolidate(rawText: string): Promise<ConsolidatedInsight> {
    const prompt = `
      Analyze the following session transcript. Extract:
      1. Core themes (max 3)
      2. Key hypotheses or decisions
      3. Potential contradictions with prior context
      4. Confidence score (0.0-1.0)
      Return valid JSON only.
      Transcript: ${rawText}
    `;

    const response = await this.client.chat({
      model: this.model,
      messages: [{ role: 'user', content: prompt }],
      stream: false,
      options: { temperature: 0.2, num_ctx: 4096 }
    });

    return JSON.parse(response.message.content);
  }
}

interface ConsolidatedInsight {
  themes: string[];
  hypotheses: string[];
  contradictions: string[];
  confidence: number;
}

Step 3: Vector Persistence & Recursive Consolidation

PostgreSQL with pgvector serves as the persistence layer. Raw embeddings are stored alongside condensed insights. The system implements recursive consolidation: when a new summary is generated, it is compared against existing vectors. If similarity exceeds a threshold, the system merges insights and generates a higher-order summary. This creates a self-referential loop that compresses knowledge without losing traceability.

// vector-store.ts
import { Pool } from 'pg';

export class MemoryVault {
  private pool: Pool;

  constructor(connectionString: string) {
    this.pool = new Pool({ connectionString });
    this.initSchema();
  }

  private async initSchema(): Promise<void> {
    await this.pool.query(`
      CREATE EXTENSION IF NOT EXISTS vector;
      CREATE TABLE IF NOT EXISTS memory_vectors (
        id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
        content TEXT NOT NULL,
        embedding vector(768),
        depth INTEGER DEFAULT 1,
        created_at TIMESTAMPTZ DEFAULT NOW(),
        parent_id UUID REFERENCES memory_vectors(id)
      );
    `);
  }

  async storeInsight(content: string, embedding: number[], depth: number = 1, parentId?: string): Promise<string> {
    const result = await this.pool.query(
      `INSERT INTO memory_vectors (content, embedding, depth, parent_id)
       VALUES ($1, $2, $3, $4) RETURNING id`,
      [content, `[${embedding.join(',')}]`, depth, parentId]
    );
    return result.rows[0].id;
  }

  async findContradictions(newEmbedding: number[], threshold: number = 0.75): Promise<string[]> {
    const result = await this.pool.query(
      `SELECT content FROM memory_vectors
       WHERE 1 - (embedding <=> $1) > $2
       AND depth = 1
       ORDER BY embedding <=> $1`,
      [`[${newEmbedding.join(',')}]`, threshold]
    );
    return result.rows.map(r => r.content);
  }
}

Architecture Decisions & Rationale

PostgreSQL + pgvector: Chosen for ACID compliance, mature concurrency handling, and native vector similarity operations. Unlike specialized vector databases, PostgreSQL supports recursive queries, foreign key constraints, and transactional rollbacks, which are critical when merging summaries.
Recursive Depth Control: The depth column tracks consolidation levels. Depth 1 holds raw summaries; Depth 2+ holds merged insights. This prevents unbounded compression and preserves audit trails.
Local LLM Routing: Ollama manages model lifecycle and context windows. Running Gemma3 locally ensures zero data exfiltration while maintaining sufficient throughput for periodic consolidation.
Explicit Contradiction Tracking: Rather than relying on implicit similarity, the system flags semantic conflicts during consolidation. This enables the infrastructure to self-audit and surface evolving or conflicting hypotheses.

Pitfall Guide

1. Semantic Drift in Recursive Summarization

Explanation: Repeated summarization compresses context until nuance is lost. The system may generate coherent but factually hollow insights. Fix: Maintain source references in every consolidation step. Implement a maximum depth threshold (typically 3-4). Periodically re-index raw logs to reset drift.

2. Vector Space Saturation

Explanation: Storing every embedding without pruning causes query latency to degrade and storage costs to balloon. Fix: Implement tiered retention. Keep high-confidence, high-depth vectors permanently. Archive low-similarity or low-confidence entries to cold storage. Use HNSW indexes with optimized m and ef_construction parameters.

3. Contradiction Blindness

Explanation: The system fails to detect when new inputs conflict with established summaries, leading to inconsistent knowledge graphs. Fix: Enforce explicit contradiction detection in the summarization prompt. Set similarity thresholds that trigger manual review queues. Log all flagged conflicts for audit trails.

4. Prompt Template Rigidity

Explanation: Fixed prompts perform poorly as topics evolve or domain-specific terminology emerges. Fix: Version-control prompt templates. Implement dynamic prompt assembly that injects recent high-confidence summaries as context. Rotate temperature settings based on consolidation depth.

5. Human-AI Feedback Loop Breakdown

Explanation: The system runs autonomously without validation, causing misaligned priorities or irrelevant consolidations. Fix: Introduce explicit confidence scoring. Route low-confidence insights to a review queue. Allow manual overrides that update vector weights and prompt templates.

6. Token Budget Mismanagement

Explanation: Recursive loops consume excessive compute during peak ingestion, causing pipeline stalls. Fix: Implement batch processing with priority queuing. Use adaptive summarization depth based on available VRAM. Schedule heavy consolidation during off-peak hours.

Production Bundle

Action Checklist

Initialize PostgreSQL with pgvector extension and verify HNSW index performance
Configure Ollama with Gemma3 and validate local throughput stability
Implement session ingestion with timestamped directories and metadata headers
Deploy consolidation engine with explicit contradiction detection prompts
Set up recursive depth tracking and maximum compression thresholds
Integrate confidence scoring and manual review queue for low-trust insights
Monitor vector space growth and implement tiered retention policies
Schedule periodic full-log re-indexing to prevent semantic drift

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Privacy-critical research	Recursive Local Memory	Zero data exfiltration, full audit control	High initial hardware, near-zero operational cost
High-volume public API	Cloud RAG with ephemeral context	Scalable, managed infrastructure	Low initial, high recurring API costs
Rapid prototyping	Local Vector RAG	Fast setup, minimal configuration	Moderate hardware, low maintenance
Long-term knowledge accumulation	Recursive Local Memory	Self-consolidating, contradiction-aware	High initial, compounding ROI over time

Configuration Template

# memory-engine.config.yaml
ingestion:
  vault_path: ./knowledge-vault
  log_directory: session-logs
  batch_size: 50
  schedule: "0 */6 * * *"

consolidation:
  model: gemma3
  ollama_endpoint: http://localhost:11434
  max_depth: 3
  temperature: 0.2
  context_window: 4096

storage:
  database_url: postgresql://user:pass@localhost:5432/memory_db
  vector_dimension: 768
  hnsw_m: 16
  hnsw_ef_construction: 64
  similarity_threshold: 0.75

retention:
  raw_log_ttl_days: 90
  depth_1_retention: 365
  depth_2_plus_retention: permanent
  archive_path: ./archive

Quick Start Guide

Initialize Database: Run docker run -d --name memdb -e POSTGRES_PASSWORD=localdev -p 5432:5432 pgvector/pgvector:pg16 and execute the schema initialization script.
Pull Local Model: Execute ollama pull gemma3 and verify throughput with ollama run gemma3 "test".
Deploy Pipeline: Clone the repository, copy memory-engine.config.yaml, and run npm run start:consolidator. The system will begin scanning session-logs/ and generating embeddings.
Validate Loop: Create a test session, wait for the next consolidation cycle, and query SELECT content, depth FROM memory_vectors ORDER BY created_at DESC LIMIT 5; to verify recursive storage.
Monitor & Iterate: Track vector growth, adjust similarity thresholds, and route low-confidence insights to your review queue. The system will self-audit and refine its knowledge graph over time.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back