he agent's after-action report. When a future session asks "Did I already try to book that flight?" or "What happened the last time this user asked about a refund?", episodic memory is where we look.
Document shape
{
_id: ObjectId("..."),
episode_id: "ep_20260429_xk92",
thread_id: "thread_abc123",
user_id: "user_9921",
summary: "Looked up refund status for order #8842. Confirmed refund is processing, ETA 3-5 business days.",
tool_calls: [
{
tool: "lookup_order",
inputs: { order_id: "8842" },
output: { status: "refund_processing", eta_days: 5 },
success: true,
called_at: ISODate("2026-04-29T14:01:05Z")
}
],
outcome: "succeeded",
tags: ["refund", "order-lookup"],
started_at: ISODate("2026-04-29T14:01:00Z"),
ended_at: ISODate("2026-04-29T14:01:08Z")
}
Enter fullscreen mode Exit fullscreen mode
Field-by-field walkthrough
episode_id uniquely identifies this run. It becomes the source_episode_id on any semantic memories that get extracted from this episode, giving us a clear provenance trail.
thread_id + user_id links the episode back to the session it came from. thread_id lets us find all episodes that share a conversation context; user_id lets us find all episodes for a given user across all sessions.
summary is a human-readable description of what happened, written or generated at episode close. We keep it factual and outcome-oriented: what was the goal, what was done, what was the result. This is the field we search when a future agent needs to recall prior work.
tool_calls[] is the detailed trace. We store inputs and outputs here so we can diagnose failures, avoid repeating failed approaches, and give the agent evidence to reason from. For example, "last time I called lookup_order with this ID, it returned status not_found." Note that, to avoid hitting the 16MB BSON document limit, you should implement a truncation policy for large tool outputs or cap the maximum number of tool calls per episode. If your agents perform hundreds of steps with heavy payloads, consider moving these traces to a separate tool_call_logs collection referenced by episode_id. For this reference schema, we assume bounded episodes where the action history remains within the document limit.
outcome is a controlled vocabulary field: succeeded, failed, or abandoned. It's a fast filter; if a future agent is looking for precedent, it might specifically want the last successful episode of a given type.
tags[] are optional but valuable for retrieval. They let us filter episodic recall to a domain, "show me recent refund episodes for this user", without having to run a full-text search on summary.
Why this is different from the chat log
The distinction matters in practice. The chat log captures conversation, turns, tone, and clarifications. The episode captures action, what the agent committed to, and what happened. We need both, and they serve different retrieval purposes. We pull the chat log to reconstruct conversational context; we pull the episode to reconstruct operational history.
Index
db.episodic_memory.createIndex({ episode_id: 1 }, { unique: true })
db.episodic_memory.createIndex({ user_id: 1, started_at: -1 })
db.episodic_memory.createIndex({ user_id: 1, tags: 1, started_at: -1 })
db.episodic_memory.createIndex(
{ summary: "text" },
{ default_language: "english" }
)
)
Enter fullscreen mode Exit fullscreen mode
The unique index on episode_id is our primary integrity guard, ensuring that provenance links from other collections remain stable. The compound index on user_id + started_at serves a double-duty: it powers the most common "most recent N episodes" queries while providing the necessary performance for background retention sweeps (e.g., deleting records older than 90 days).
Adding tags to the compound index supports filtered recall by domain, allowing the agent to retrieve history specific to a topic like "refunds" or "billing." However, since tags is an array, this becomes a Multikey Index. It is a critical MongoDB constraint that a compound index can only contain one array field; if you later extend this schema with another array (like tools_used[]), you cannot add it to this existing index without causing inserts to fail. Finally, the text index on summary handles the "fuzzy" keyword-based discovery that vector search often misses
Semantic Memory: What the Agent Knows About the User
What it stores
Semantic memory is the long-term knowledge store, persistent facts and preferences that should survive across sessions indefinitely. This is where the agent remembers that a user prefers metric units, dislikes upsells, or has a standing instruction to always confirm before booking.
Unlike short-term and episodic memory, we retrieve semantic memories by meaning, not by session or timestamp. This is the collection that genuinely needs a vector index.
Document shape
{
_id: ObjectId("..."),
memory_id: "mem_u9921_0047",
user_id: "user_9921",
type: "preference",
content: "User prefers distances and weights in metric units.",
embedding: [0.023, -0.117, 0.204, ...],
source_episode_id: "ep_20260429_xk92",
strength: 0.87,
last_accessed_at: ISODate("2026-04-29T14:01:00Z"),
created_at: ISODate("2026-04-01T09:15:00Z")
}
Enter fullscreen mode Exit fullscreen mode
Field-by-field walkthrough
user_id is critical here. We should always filter by user_id before running vector search. MongoDB Atlas Vector Search supports pre-filters on indexed fields β we use them. Running a pure ANN search across all users and then filtering the results is both slower and a potential data isolation bug waiting to happen.
type is a controlled vocabulary: preference (the user wants something a certain way), fact (something true about the user or their context), or instruction (an explicit standing directive). This lets us selectively inject memories by type depending on what the current task needs.
content is the plain-text statement that gets injected into the prompt. We keep it short and declarative β i.e., one fact per document. Chunking multiple facts into a single document makes both retrieval and decay harder to reason about.
embedding is the vector representation of content, generated at write time and stored as a BSON array of doubles. While often prototyped as a BSON array of doubles, you should use BSON BinData (with the vector subtype) for production systems. Using BinData allows MongoDB to compress your embeddings, requiring roughly three times less disk space. More importantly, it enables Atlas Vector Search to leverage quantization (like int8 or binary), which can reduce RAM requirements by up to 24xβa critical optimization for 2026-scale agentic memory stores where high-dimensional vector storage costs can otherwise become prohibitive. On retrieval, we embed the current query and run a nearest-neighbor search filtered by user_id.
strength is a float between 0 and 1 that decays over time and resets toward 1 on access. We recommend an exponential decay with a half-life (e.g., 30 days) combined with a multiplicative reset on access. For example, every time a memory is retrieved, you might close 30% of the gap toward 1.0 ($strength_{new} = strength_{old} + (1.0 - strength_{old}) \times 0.3$). This ensures that frequently used preferences stay near 1.0, while unreferenced "facts" naturally sink to the bottom of your search results over time, allowing the agent's "personality" to evolve with the user. Memories that haven't been relevant in a long time fade; memories that keep being retrieved stay strong. This field is our alternative to TTL expiration for long-term memory - we don't want to hard-delete a preference just because it hasn't been triggered in 90 days, but we do want to deprioritize it in retrieval ranking. To manage this, we use a periodic background sweep (e.g., a daily cron job) that targets the last_accessed_at field to apply decay, rather than calculating it at read-time, which would add unnecessary latency to every retrieval.
source_episode_id is the provenance link, it tells us which agent run produced this memory. This matters for auditability and for bulk-invalidating memories that came from a faulty episode (e.g., if a tool malfunctioned). We don't index this field by default to save on write overhead, as bulk invalidation is typically an infrequent administrative task. However, if your pipeline requires high-frequency memory rollbacks, you should add a standard index: db.semantic_memory.createIndex({ source_episode_id: 1 }).
Embed vs. reference
Each semantic memory is its own document. We retrieve these semantically and independently β there is no "give me all memories for this thread" read pattern. References would add indirection without benefit. The document-per-memory model also makes it straightforward to update strength and last_accessed_at atomically without touching sibling memories.
Index
// Primary lookup and uniqueness constraint
db.semantic_memory.createIndex({ memory_id: 1 }, { unique: true })
// Compound index for non-vector queries (retrieval ranking)
db.semantic_memory.createIndex({ user_id: 1, type: 1, strength: -1 })
// Supporting provenance lookups and bulk invalidation
db.semantic_memory.createIndex({ source_episode_id: 1 })
// Index for background decay sweeps
db.semantic_memory.createIndex({ user_id: 1, last_accessed_at: 1 })
// Atlas Vector Search Index Definition
{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 1024, // Optimized for voyage-3-large
"similarity": "cosine"
},
{
"type": "filter",
"path": "user_id"
},
{
"type": "filter",
"path": "type"
}
]
}
Enter fullscreen mode Exit fullscreen mode
The vector index definition includes user_id and type as filter fields; this is what enables the $vectorSearch pre-filter. Always filter by user_id before vector search to prevent cross-user leakage.
The Referential Model: How the Three Collections Wire Together
The three join keys are user_id, thread_id, and episode_id. Here is how they move through a session lifecycle:
Session starts
β short_term_memory document created
{ thread_id: "thread_abc123", user_id: "user_9921", messages: [] }
Turns accumulate
β messages[] grows via $push
β summary updated when token budget is reached
Session ends / agent run completes
β episodic_memory record written
{ episode_id: "ep_...", thread_id: "thread_abc123", user_id: "user_9921" }
Semantic extraction runs
β facts and preferences identified from the episode
β semantic_memory documents written
{ source_episode_id: "ep_...", user_id: "user_9921", ... }
Enter fullscreen mode Exit fullscreen mode
The read pattern per memory type
- Short-term: fetch by
thread_id. This is a single point query on one document. We always know the thread before we read.
- Episodic: query by
user_id sorted by started_at descending for recency, or combine with a tags filter for domain-specific recall. Text search on summary for keyword-based retrieval.
- Semantic: $vectorSearch with a
user_id pre-filter, optionally narrowed by type. Rank results by vector similarity, break ties, or rerank by strength.
What we never do
Join across these collections at query time. Each retrieval path is independent by design. The thread_id and episode_id fields are provenance links, useful for audit, debugging, and bulk operations, not foreign keys that we join on in the hot path.
Index Strategy: What Actually Needs a Vector Index (and What Doesn't)
The most common mistake in early agent memory implementations is adding vector indexes everywhere. Here is where we actually need them.
- Short-term memory: no vector index. We always retrieve by
thread_id. We know the session before we read. A vector index here would never be used and would slow down writes for no benefit.
- Episodic memory: maybe, later. A text index on summary covers the majority of episodic recall needs β keyword-based retrieval like "find episodes involving refunds for this user." A vector index is only justified when we need semantic episode recall: "when did I last help this user with X?" β and most applications don't need this at launch. Start without it, add it only when we have a concrete retrieval requirement that text search can't satisfy.
- Semantic memory: yes, this is the one. Semantic memory is our knowledge retrieval layer. Retrieval by meaning is the whole point. This collection needs a vector index from day one.
TTL strategy
Short-term memory is the natural TTL candidate. We wire up the TTL index on expires_at and let MongoDB handle the cleanup. A 24 to 48-hour expiry is reasonable for most applications.
Semantic memory should not use TTL expiration. A preference learned six months ago and not recently triggered should deprioritize in retrieval, not disappear entirely. We use the strength field for soft decay and reserve hard deletion for explicit user requests or compliance requirements.
Summary
Here are the three schemas as a single reference:
Attribute
short_term_memory
episodic_memory
semantic_memory
Primary key
thread_id
episode_id
memory_id
Scope key
user_id
user_id
user_id
Retrieved by
thread_id
user_id + recency/tags
vector search + user_id
Vector index
No
Optional
Yes
TTL
TTL index
No
No
Expiry mechanism
expires_at TTL
started_at policy
strength decay
The taxonomy, short-term, episodic, semantic, maps cleanly onto three MongoDB collections with distinct retrieval patterns and distinct index strategies. Each collection is optimized for how it is actually read, not for a generalized "memory store" abstraction.
- Short-term memory is optimized for high-speed session lookups and uses native TTL indexes for automatic cleanup.
- Episodic memory serves as the immutable audit trail. We reuse the
started_at index for both recency-based retrieval and manual retention sweeps, avoiding the overhead of a separate age-based index.
- Semantic memory acts as the long-term knowledge base. It is the only collection requiring a vector index and uses a background "strength decay" sweep based on
last_accessed_at to keep retrieval results fresh and relevant.
For the conceptual background on agent memory types, the LangGraph Memory Store documentation and Richmond's taxonomy post are the right starting points. This schema is what you build once you've read those and are ready to sit down with a database.