ure prioritizes data locality, recursive compression, and explicit contradiction tracking.
Step 1: Session Ingestion & Structuring
Raw interactions must be captured in a deterministic format. Instead of appending to a single file, each session is written to a timestamped directory with explicit metadata headers. This enables batch processing and prevents race conditions during concurrent writes.
// session-logger.ts
import { writeFileSync, mkdirSync, existsSync } from 'fs';
import { join } from 'path';
interface SessionEntry {
timestamp: string;
role: 'user' | 'assistant' | 'system';
content: string;
tags?: string[];
}
export class SessionIngestor {
private vaultPath: string;
private logDir: string;
constructor(vaultRoot: string) {
this.vaultPath = vaultRoot;
this.logDir = join(vaultRoot, 'session-logs');
if (!existsSync(this.logDir)) mkdirSync(this.logDir, { recursive: true });
}
async appendEntry(entry: SessionEntry): Promise<void> {
const dateStr = new Date().toISOString().split('T')[0];
const filePath = join(this.logDir, `${dateStr}.md`);
const frontmatter = `---\ntype: session-log\ndate: ${dateStr}\n---\n\n`;
const block = `### ${entry.role.toUpperCase()} [${entry.timestamp}]\n${entry.content}\n`;
const content = existsSync(filePath) ? block : frontmatter + block;
writeFileSync(filePath, content, { flag: existsSync(filePath) ? 'a' : 'w' });
}
}
Step 2: Local Summarization Pipeline
A dedicated consolidation engine periodically scans the ingestion directory. It routes raw logs to a local LLM via Ollama, requesting structured extraction rather than free-form summarization. The prompt enforces topic clustering, confidence scoring, and explicit contradiction flags.
// cortex-engine.ts
import { Ollama } from 'ollama';
export class CortexEngine {
private client: Ollama;
private model: string;
constructor(modelName: string = 'gemma3') {
this.client = new Ollama();
this.model = modelName;
}
async consolidate(rawText: string): Promise<ConsolidatedInsight> {
const prompt = `
Analyze the following session transcript. Extract:
1. Core themes (max 3)
2. Key hypotheses or decisions
3. Potential contradictions with prior context
4. Confidence score (0.0-1.0)
Return valid JSON only.
Transcript: ${rawText}
`;
const response = await this.client.chat({
model: this.model,
messages: [{ role: 'user', content: prompt }],
stream: false,
options: { temperature: 0.2, num_ctx: 4096 }
});
return JSON.parse(response.message.content);
}
}
interface ConsolidatedInsight {
themes: string[];
hypotheses: string[];
contradictions: string[];
confidence: number;
}
Step 3: Vector Persistence & Recursive Consolidation
PostgreSQL with pgvector serves as the persistence layer. Raw embeddings are stored alongside condensed insights. The system implements recursive consolidation: when a new summary is generated, it is compared against existing vectors. If similarity exceeds a threshold, the system merges insights and generates a higher-order summary. This creates a self-referential loop that compresses knowledge without losing traceability.
// vector-store.ts
import { Pool } from 'pg';
export class MemoryVault {
private pool: Pool;
constructor(connectionString: string) {
this.pool = new Pool({ connectionString });
this.initSchema();
}
private async initSchema(): Promise<void> {
await this.pool.query(`
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE IF NOT EXISTS memory_vectors (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
embedding vector(768),
depth INTEGER DEFAULT 1,
created_at TIMESTAMPTZ DEFAULT NOW(),
parent_id UUID REFERENCES memory_vectors(id)
);
`);
}
async storeInsight(content: string, embedding: number[], depth: number = 1, parentId?: string): Promise<string> {
const result = await this.pool.query(
`INSERT INTO memory_vectors (content, embedding, depth, parent_id)
VALUES ($1, $2, $3, $4) RETURNING id`,
[content, `[${embedding.join(',')}]`, depth, parentId]
);
return result.rows[0].id;
}
async findContradictions(newEmbedding: number[], threshold: number = 0.75): Promise<string[]> {
const result = await this.pool.query(
`SELECT content FROM memory_vectors
WHERE 1 - (embedding <=> $1) > $2
AND depth = 1
ORDER BY embedding <=> $1`,
[`[${newEmbedding.join(',')}]`, threshold]
);
return result.rows.map(r => r.content);
}
}
Architecture Decisions & Rationale
- PostgreSQL + pgvector: Chosen for ACID compliance, mature concurrency handling, and native vector similarity operations. Unlike specialized vector databases, PostgreSQL supports recursive queries, foreign key constraints, and transactional rollbacks, which are critical when merging summaries.
- Recursive Depth Control: The
depth column tracks consolidation levels. Depth 1 holds raw summaries; Depth 2+ holds merged insights. This prevents unbounded compression and preserves audit trails.
- Local LLM Routing: Ollama manages model lifecycle and context windows. Running Gemma3 locally ensures zero data exfiltration while maintaining sufficient throughput for periodic consolidation.
- Explicit Contradiction Tracking: Rather than relying on implicit similarity, the system flags semantic conflicts during consolidation. This enables the infrastructure to self-audit and surface evolving or conflicting hypotheses.
Pitfall Guide
1. Semantic Drift in Recursive Summarization
Explanation: Repeated summarization compresses context until nuance is lost. The system may generate coherent but factually hollow insights.
Fix: Maintain source references in every consolidation step. Implement a maximum depth threshold (typically 3-4). Periodically re-index raw logs to reset drift.
2. Vector Space Saturation
Explanation: Storing every embedding without pruning causes query latency to degrade and storage costs to balloon.
Fix: Implement tiered retention. Keep high-confidence, high-depth vectors permanently. Archive low-similarity or low-confidence entries to cold storage. Use HNSW indexes with optimized m and ef_construction parameters.
3. Contradiction Blindness
Explanation: The system fails to detect when new inputs conflict with established summaries, leading to inconsistent knowledge graphs.
Fix: Enforce explicit contradiction detection in the summarization prompt. Set similarity thresholds that trigger manual review queues. Log all flagged conflicts for audit trails.
4. Prompt Template Rigidity
Explanation: Fixed prompts perform poorly as topics evolve or domain-specific terminology emerges.
Fix: Version-control prompt templates. Implement dynamic prompt assembly that injects recent high-confidence summaries as context. Rotate temperature settings based on consolidation depth.
5. Human-AI Feedback Loop Breakdown
Explanation: The system runs autonomously without validation, causing misaligned priorities or irrelevant consolidations.
Fix: Introduce explicit confidence scoring. Route low-confidence insights to a review queue. Allow manual overrides that update vector weights and prompt templates.
6. Token Budget Mismanagement
Explanation: Recursive loops consume excessive compute during peak ingestion, causing pipeline stalls.
Fix: Implement batch processing with priority queuing. Use adaptive summarization depth based on available VRAM. Schedule heavy consolidation during off-peak hours.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Privacy-critical research | Recursive Local Memory | Zero data exfiltration, full audit control | High initial hardware, near-zero operational cost |
| High-volume public API | Cloud RAG with ephemeral context | Scalable, managed infrastructure | Low initial, high recurring API costs |
| Rapid prototyping | Local Vector RAG | Fast setup, minimal configuration | Moderate hardware, low maintenance |
| Long-term knowledge accumulation | Recursive Local Memory | Self-consolidating, contradiction-aware | High initial, compounding ROI over time |
Configuration Template
# memory-engine.config.yaml
ingestion:
vault_path: ./knowledge-vault
log_directory: session-logs
batch_size: 50
schedule: "0 */6 * * *"
consolidation:
model: gemma3
ollama_endpoint: http://localhost:11434
max_depth: 3
temperature: 0.2
context_window: 4096
storage:
database_url: postgresql://user:pass@localhost:5432/memory_db
vector_dimension: 768
hnsw_m: 16
hnsw_ef_construction: 64
similarity_threshold: 0.75
retention:
raw_log_ttl_days: 90
depth_1_retention: 365
depth_2_plus_retention: permanent
archive_path: ./archive
Quick Start Guide
- Initialize Database: Run
docker run -d --name memdb -e POSTGRES_PASSWORD=localdev -p 5432:5432 pgvector/pgvector:pg16 and execute the schema initialization script.
- Pull Local Model: Execute
ollama pull gemma3 and verify throughput with ollama run gemma3 "test".
- Deploy Pipeline: Clone the repository, copy
memory-engine.config.yaml, and run npm run start:consolidator. The system will begin scanning session-logs/ and generating embeddings.
- Validate Loop: Create a test session, wait for the next consolidation cycle, and query
SELECT content, depth FROM memory_vectors ORDER BY created_at DESC LIMIT 5; to verify recursive storage.
- Monitor & Iterate: Track vector growth, adjust similarity thresholds, and route low-confidence insights to your review queue. The system will self-audit and refine its knowledge graph over time.