or parallel I/O and tiered eviction, without introducing routing overhead.
Core Solution
The 21-layer architecture partitions memory into six functional tiers, each with dedicated I/O paths, eviction policies, and consolidation routines. Python orchestrates layer routing, while SQLite manages structured metadata, cross-references, and TTL state.
Architecture Tiers:
- Sensory/Buffer (Layers 1β3): Raw token ingestion, deduplication, and noise filtering.
- Working Memory (Layers 4β7): Short-term state, active task context, and immediate next-step prediction.
- Semantic/Vector (Layers 8β12): Embedding storage, similarity search, and cross-modal alignment.
- Relational/SQLite (Layers 13β16): Structured facts, entity graphs, session metadata, and TTL tracking.
- Long-Term/Archival (Layers 17β19): Compressed knowledge, consolidated patterns, and cold storage.
- Meta/Orchestration (Layers 20β21): Routing logic, attention scoring, decay functions, and garbage collection.
Python Implementation:
import sqlite3
import time
import numpy as np
from typing import Dict, List, Optional
class MemoryLayer:
def __init__(self, layer_id: int, ttl_seconds: float, decay_rate: float = 0.05):
self.layer_id = layer_id
self.ttl_seconds = ttl_seconds
self.decay_rate = decay_rate
self.entries: Dict[str, dict] = {}
self.last_flush = time.time()
def add(self, key: str, payload: dict, attention_score: float):
self.entries[key] = {
"payload": payload,
"attention": attention_score,
"created_at": time.time(),
"last_accessed": time.time()
}
def decay(self) -> List[str]:
now = time.time()
expired = []
for key, entry in list(self.entries.items()):
age = now - entry["created_at"]
effective_ttl = self.ttl_seconds * (1 + entry["attention"])
if age > effective_ttl or (now - entry["last_accessed"]) > (self.ttl_seconds * 0.5):
expired.append(key)
for key in expired:
del self.entries[key]
return expired
class SQLiteMemoryBridge:
def __init__(self, db_path: str = ":memory:"):
self.conn = sqlite3.connect(db_path, check_same_thread=False)
self._init_schema()
def _init_schema(self):
self.conn.execute("""
CREATE TABLE IF NOT EXISTS memory_index (
key TEXT PRIMARY KEY,
layer_id INTEGER,
attention REAL,
created_at REAL,
ttl REAL,
status TEXT DEFAULT 'active'
)
""")
self.conn.commit()
def upsert_index(self, key: str, layer_id: int, attention: float, ttl: float):
self.conn.execute("""
INSERT OR REPLACE INTO memory_index
(key, layer_id, attention, created_at, ttl, status)
VALUES (?, ?, ?, ?, ?, 'active')
""", (key, layer_id, attention, time.time(), ttl))
self.conn.commit()
def query_active_keys(self, layer_id: Optional[int] = None) -> List[str]:
q = "SELECT key FROM memory_index WHERE status='active'"
if layer_id is not None:
q += " AND layer_id = ?"
return [row[0] for row in self.conn.execute(q, (layer_id,) if layer_id is not None else ())]
class TwentyOneLayerStack:
def __init__(self):
self.layers = [MemoryLayer(i, ttl_seconds=300, decay_rate=0.04) for i in range(21)]
self.bridge = SQLiteMemoryBridge()
self.routing_table = self._build_routing_table()
def _build_routing_table(self) -> Dict[int, int]:
# Maps attention score ranges to target layer tiers
return {
0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7,
8: 8, 9: 9, 10: 10, 11: 11, 12: 12,
13: 13, 14: 14, 15: 15, 16: 16,
17: 17, 18: 18, 19: 19, 20: 20
}
def ingest(self, key: str, payload: dict, attention_score: float):
target_layer = min(20, max(0, int(attention_score * 20)))
layer = self.layers[target_layer]
layer.add(key, payload, attention_score)
self.bridge.upsert_index(key, target_layer, attention_score, layer.ttl_seconds)
def consolidate(self):
for i, layer in enumerate(self.layers):
expired = layer.decay()
for key in expired:
self.bridge.conn.execute(
"UPDATE memory_index SET status='expired' WHERE key=?", (key,)
)
self.bridge.conn.commit()
Architecture Decisions:
- 21 Layers: Chosen to align with tiered I/O parallelism. Each tier handles distinct lifecycle stages without cross-contamination.
- SQLite for Metadata: Relational indexing enables O(1) TTL lookups and cross-layer validation, avoiding full vector scans.
- Attention-Weighted TTL: Dynamic expiration prevents premature purging of high-signal context while aggressively dropping noise.
- Consolidation Routine: Runs asynchronously every 60 seconds to flush expired entries and compress archival layers.
Pitfall Guide
- Over-Engineering Layer Granularity: Adding layers beyond routing capacity increases lookup latency. Keep layer boundaries aligned with distinct I/O patterns and eviction policies.
- Ignoring Temporal Decay Functions: Linear TTL causes context collapse. Use exponential or attention-weighted decay to preserve high-value state beyond the 5-minute window.
- SQLite Connection Pooling Mismanagement: Unpooled or synchronous connections bottleneck concurrent AI requests. Implement
check_same_thread=False with connection pooling or async wrappers for production workloads.
- Semantic-Relational Misalignment: Vector embeddings and SQLite schemas drift without cross-layer validation. Maintain a unified key namespace and periodic reconciliation jobs.
- Hardcoded TTL Values: Static expiration ignores session importance. Derive TTL dynamically from attention scores, user interaction frequency, and task criticality.
- Memory Leak in Buffer Layers: Unflushed sensory buffers accumulate garbage tokens. Implement aggressive deduplication and noise filtering at layers 1β3 before routing.
- Synchronous Consolidation Blocking Ingestion: Running decay/flush routines on the main thread stalls new memory writes. Offload consolidation to background workers or async event loops.
Deliverables
- Blueprint: Hierarchical memory routing diagram mapping attention scores to layer tiers, I/O paths, and consolidation triggers. Includes SQLite schema definition and TTL decay curves.
- Checklist: Pre-deployment validation steps covering connection pooling verification, layer boundary stress testing, TTL calibration against target forgetting rates, and cross-layer key reconciliation.
- Configuration Templates: YAML/JSON manifests for layer thresholds, SQLite indexing parameters, decay rate tuning, and async consolidation schedules. Ready for direct integration into Python-based AI agent frameworks.