Back to KB
Difficulty
Intermediate
Read Time
13 min

Designing the Agent Memory Schema: Document Shapes for Short-Term, Episodic, and Semantic Memory in MongoDB

By Codcompass TeamΒ·Β·13 min read

This tutorial was written by Yunying Karen Zhang.

If you've been following the recent wave of writing on AI agent memory, you've probably read about the taxonomy of memory types, or how frameworks like LangGraph expose a Memory Store API. All of that is a valuable foundation. But there's a gap: none of it shows you what the documents actually look like.

The taxonomy tells you what the memory types are. The framework docs tell you how to call the API. This post fills the space in between β€” it's about how to design the collections when you are the one sitting in front of MongoDB Compass with a blank schema.

By the end, you'll have three concrete document shapes, the indexes that back them, and a clear picture of how they wire together. Let's build it from the ground up.

Short-Term Memory: The Conversation Window

What it stores

Short-term memory is the live message log for an active session. It answers the question: what has been said in this conversation so far? It also holds a rolling summaryβ€”a compressed version of older turns that gets substituted in when the raw message log would overflow the model's context window.

This is the most transient of the three stores. Once a session ends, this data is largely superseded by the episodic record we'll cover in Section 2. It's a natural candidate for TTL expiration.

Document shape

Below is a representative document for a live conversation session. We'll walk through each field afterward:

{
  _id: ObjectId("..."),
  thread_id: "thread_abc123",
  user_id: "user_9921",
  messages: [
    {
      role: "user",
      content: "What's the status of my refund?",
      timestamp: ISODate("2026-04-29T14:15:00Z"),
      token_count: 12
    },
    {
      role: "assistant",
      content: "Let me look that up for you.",
      timestamp: ISODate("2026-04-29T14:15:03Z"),
      token_count: 9
    }
  ],
  summary: "User initiated chat at 14:00 regarding general policy; now specifically asking for refund status on order #8842.",
  summary_updated_at: ISODate("2026-04-29T14:10:00Z"),
  created_at: ISODate("2026-04-29T14:00:00Z"),
  expires_at: ISODate("2026-04-30T14:00:00Z")
}

Enter fullscreen mode Exit fullscreen mode

Field-by-field walkthrough

  • thread_id is the primary lookup key. Every read against this collection is "give me everything for this thread", it's always a point query on this field.
  • user_id is on every document in every collection in this schema. We scope all queries to a user first, always. Cross-user memory leakage is an easy mistake and a serious one, never retrieve a thread without a user_id filter unless you have an explicit reason to.
  • messages[] is an embedded array. Each element carries role, content, timestamp, and token_count. Tracking token count per message lets us calculate how close we are to context window limits without re-tokenizing the whole conversation on every turn.
  • summary and summary_updated_at work together. When the cumulative token count of messages[] exceeds a threshold β€” say, 80% of the model's context window, we compress the oldest turns into summary, drop those messages from the array, and update summary_updated_at. This timestamp tells us how stale the summary is: if several new messages have arrived since the last summary, we may want to regenerate before injecting it into the prompt.
  • expires_at pairs with a TTL index. Sessions shouldn't persist forever; 24 to 48 hours is a reasonable default for most applications.

Embed vs. reference

We embed messages inside the document rather than storing them in a separate collection because the read pattern always gives the whole thread. There is no use case for fetching a single message in isolation. Embedding keeps the read to a single document fetch and lets us atomically append a new message with a simple $push. The one trade-off is document size, but the summary-and-truncate mechanism keeps this bounded in practice.

Index

db.short_term_memory.createIndex({ thread_id: 1 }, { unique: true })
db.short_term_memory.createIndex({ expires_at: 1 }, { expireAfterSeconds: 0 })

Enter fullscreen mode Exit fullscreen mode

That's all we need here. No vector index β€” we are never searching this collection by meaning. We always know the thread_id before we read. For more on createIndex options, see the MongoDB documentation.

Episodic Memory: What the Agent Actually Did

What it stores

Episodic memory is a record of a completed agent run. Not what was said β€” that's the chat log. What was done: which tools were called, what their inputs and outputs were, and what the overall outcome was.

Think of it as t

πŸŽ‰ Mid-Year Sale β€” Unlock Full Article

Base plan from just $4.99/mo or $49/yr

Sign in to read the full article and unlock all 635+ tutorials.

Sign In / Register β€” Start Free Trial

7-day free trial Β· Cancel anytime Β· 30-day money-back