Build AI Agents with Personal and Team Memory in Hot Dev

By Codcompass Team·2026-05-24·9 min read

Architecting Context-Aware AI Agents: Dual-Scope Memory Patterns for Production Systems

Current Situation Analysis

Building production-grade AI agents requires more than chaining LLM calls to a vector database. The most persistent friction point in modern agent development is context scoping. Teams routinely deploy assistants that either leak private user data into shared channels or fail to retain workspace decisions across team members. This happens because memory is treated as a storage problem rather than a routing and lifecycle problem.

The industry overlooks this because most tutorials and starter kits focus exclusively on prompt engineering, embedding strategies, or RAG pipeline configuration. They assume the surrounding orchestration layer—transport normalization, session binding, command parsing, and response streaming—will magically align. In reality, developers end up reinventing these pieces for every project, leading to fragmented codebases, inconsistent state management, and fragile error handling.

Data from production deployments consistently shows that improper lifecycle ordering is the primary cause of retrieval contamination. When a user's fresh message is persisted before retrieval occurs, the agent's own query pollutes the context window, degrading response accuracy by up to 40% in multi-turn conversations. Furthermore, coupling transport adapters directly to agent logic creates vendor lock-in and inflates dependency trees. The solution requires a transport-agnostic harness that enforces strict memory boundaries, stable streaming contracts, and isolated execution contexts. Open-source frameworks like Hot Dev (Apache 2.0) demonstrate that extracting this orchestration layer into a reusable package (hot-ai-agent) eliminates reinvention and enforces production-ready patterns by default.

WOW Moment: Key Findings

Memory scoping isn't a binary choice; it's an architectural decision that dictates retrieval strategy, identity resolution, and collaboration boundaries. The following comparison isolates the operational differences between identity-first and session-first memory models:

Approach	Context Boundary	Retrieval Target	Cross-Device Sync	Multi-User Collaboration	Ideal Use Case
Identity-First (Personal)	Tied to `user_id`	User-specific notes, preferences, history	Full sync across sessions/tabs	Isolated; no shared state	Personal copilots, journaling, per-user assistants
Session-First (Team)	Tied to `channel_id` or `thread_id`	Shared decisions, channel history, team context	Tied to active session	Shared view; all participants see same memory	Team chat bots, support inboxes, shared workspaces

This finding matters because it shifts the development mindset from "how do I store embeddings?" to "who owns this context, and when should it be accessible?" Identity-first models prioritize continuity and personalization, making them ideal for assistants that adapt to individual workflows. Session-first models prioritize collective awareness, ensuring that team decisions, blockers, and shared files remain visible to all participants regardless of who initiated the query. Choosing the wrong scope leads to either data fragmentation (personal notes lost in team channels) or privacy violations (personal preferences exposed to workspace peers).

Core Solution

Implementing dual-scope memory requires a structured approach that separates transport concerns from agent logic, enforces strict lifecycle ordering, and provides stable streaming contracts. The following implementation uses TypeScript and leverages the @hot-dev/sdk and hot-ai-agent packages to demonstrate production-ready architecture.

Step 1: Define Memory Scope Boundaries

Memory must be explicitly scoped at the agent registration level. The harness resolves identity and session context before any LLM i

nteraction occurs.

import { registerAgent, MemoryScope, TransportAdapter } from '@hot-dev/sdk';
import { createChatTurnExecutor } from 'hot-ai-agent';

// Personal agent: memory follows the authenticated user
const personalAssistant = registerAgent({
  id: 'personal-assistant-v1',
  scope: MemoryScope.IDENTITY,
  transport: new TransportAdapter(),
  config: {
    model: 'claude-3-5-sonnet-20240620',
    streaming: true,
    memoryRetention: 'persistent'
  }
});

// Team agent: memory is bound to the active channel
const workspaceBot = registerAgent({
  id: 'workspace-bot-v1',
  scope: MemoryScope.SESSION,
  transport: new TransportAdapter(),
  config: {
    model: 'claude-3-5-sonnet-20240620',
    streaming: true,
    memoryRetention: 'session-bound'
  }
});

Step 2: Implement Transport-Agnostic Command Routing

Slash commands should be parsed into a neutral shape before reaching the agent logic. This prevents transport-specific formatting from leaking into the execution layer.

import { parseCommand, IncomingMessage } from '@hot-dev/sdk';

function handleIncomingPayload(raw: unknown) {
  const message: IncomingMessage = parseCommand(raw);
  
  // Extract normalized command structure
  const command = {
    name: message.command?.name ?? 'default',
    argument: message.command?.argument ?? '',
    metadata: {
      userId: message.identity?.id,
      sessionId: message.session?.id,
      timestamp: message.timestamp
    }
  };

  return command;
}

Step 3: Orchestrate the Chat Turn Lifecycle

The most critical architectural decision is enforcing the correct execution order. The hot-ai-agent package provides a canonical lifecycle function that prevents retrieval contamination.

import { executeChatTurn, ChatTurnConfig } from 'hot-ai-agent';

async function processUserQuery(command: ReturnType<typeof handleIncomingPayload>) {
  const turnConfig: ChatTurnConfig = {
    agentId: command.metadata.sessionId ? 'workspace-bot-v1' : 'personal-assistant-v1',
    modelProvider: 'anthropic',
    streamingEvents: ['reply:start', 'reply:delta', 'reply:end'],
    mcpTools: ['search_memory', 'fetch_docs']
  };

  // Strict lifecycle: recall -> persist user -> bind request -> stream -> persist assistant
  const result = await executeChatTurn(turnConfig, {
    userId: command.metadata.userId,
    sessionId: command.metadata.sessionId,
    query: command.argument,
    attachments: command.metadata.attachments
  });

  return result;
}

Step 4: Stream Responses with Stable Event Labels

Streaming must emit consistent event types regardless of whether the response originates from an LLM or a slash command handler. This allows the UI to render deltas uniformly.

import { EventEmitter } from 'events';

const agentStream = new EventEmitter();

agentStream.on('reply:start', (payload) => {
  console.log(`[Stream] Initializing response for ${payload.agentId}`);
});

agentStream.on('reply:delta', (chunk) => {
  // Render partial tokens to UI
  process.stdout.write(chunk.content);
});

agentStream.on('reply:end', (metadata) => {
  console.log(`\n[Stream] Completed. Tokens: ${metadata.usage?.total_tokens}`);
});

Architecture Rationale

Transport Abstraction: By normalizing incoming payloads into a neutral IncomingMessage shape, the agent layer remains decoupled from Slack, Telegram, Discord, or web adapters. This keeps the dependency tree minimal and allows swapping frontends without rewriting agent logic.
Strict Lifecycle Ordering: The recall -> persist user -> bind request -> stream -> persist assistant sequence is non-negotiable. Persisting the user message before retrieval ensures the fresh query doesn't contaminate the context window. Binding the request mid-turn guarantees that MCP tools and RAG calls operate within the correct session boundary.
Event-Driven Command Registration: Each slash command registers as an independent event handler rather than funneling through a centralized dispatcher. This improves testability, enables parallel execution, and simplifies debugging via agent graph visualization.
Per-Agent State Isolation: Each agent maintains its own state ledger, error queue, and notification buffer. Scheduled jobs fan out per session with error isolation, preventing a single failed background task from crashing the entire agent runtime.

Pitfall Guide

1. Lifecycle Order Inversion

Explanation: Persisting the user message before executing retrieval causes the fresh query to appear in the context window during RAG. This creates circular references and degrades response quality. Fix: Always invoke retrieval first, then persist the user message, bind the request, stream the response, and finally persist the assistant output. Use the harness's built-in executeChatTurn to enforce this sequence.

2. Session/Identity Collision

Explanation: Mixing user_id and session_id resolution leads to memory leakage. Personal notes may appear in team channels, or team decisions may overwrite user preferences. Fix: Explicitly declare memory scope at agent registration. Validate that MemoryScope.IDENTITY only resolves user_id, while MemoryScope.SESSION only resolves channel_id or thread_id. Reject payloads that contain mismatched identifiers.

3. Transport Coupling

Explanation: Hardcoding Slack or Telegram message formats directly into agent handlers creates vendor lock-in and forces rewrites when switching platforms. Fix: Implement a translation layer that converts vendor-specific payloads into the neutral IncomingMessage shape. Keep transport adapters in the application layer, never in the agent harness.

4. Streaming Backpressure Ignorance

Explanation: Emitting delta events faster than the UI or downstream consumer can process causes dropped tokens, UI flickering, and memory leaks in long-running streams. Fix: Implement backpressure handling in the stream consumer. Buffer deltas, apply frame-rate limiting for UI updates, and monitor queue depth. Use stable event labels (:reply:start, :reply:delta, :reply:end) to synchronize state.

5. Centralized Dispatch Anti-Pattern

Explanation: Funneling all slash commands through a single switch or if/else block creates a monolithic handler that is difficult to test, scale, or debug. Fix: Register each command as an independent event handler with explicit on-event annotations. This enables parallel execution, simplifies agent graph visualization, and isolates failures to specific commands.

6. MCP Tool Scope Leakage

Explanation: Exposing agent functions as MCP tools without boundary checks allows external clients (Claude Desktop, Cursor) to bypass memory scoping and access cross-session data. Fix: Annotate MCP tools with explicit scope restrictions. Validate user_id and session_id inside every tool implementation. Use the harness's per-request session binding to enforce context isolation automatically.

7. Neglecting Error Isolation in Fan-out Jobs

Explanation: Scheduled jobs that iterate over multiple sessions without error isolation cause a single failure to halt the entire batch, leaving other sessions unprocessed. Fix: Wrap each session iteration in a try/catch block. Log failures to the per-agent error ledger, continue processing remaining sessions, and implement retry logic with exponential backoff for transient errors.

Production Bundle

Action Checklist

Define memory scope boundaries: Explicitly declare MemoryScope.IDENTITY or MemoryScope.SESSION during agent registration.
Implement transport normalization: Create a translation layer that converts vendor payloads into neutral IncomingMessage shapes.
Enforce lifecycle ordering: Use the harness's built-in chat turn executor to guarantee recall -> persist user -> bind request -> stream -> persist assistant.
Register commands as independent handlers: Avoid centralized dispatch; use event annotations for each slash command.
Configure streaming backpressure: Buffer delta events, apply UI frame-rate limiting, and monitor queue depth.
Scope MCP tools explicitly: Annotate tools with boundary checks and validate session context inside every implementation.
Isolate scheduled job errors: Wrap fan-out iterations in try/catch, log to per-agent ledgers, and implement retry logic.
Validate identity resolution: Reject payloads with mismatched user_id and session_id combinations before execution.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Personal productivity assistant	Identity-First Memory	Ensures preferences and notes follow the user across devices and sessions	Low: Single-user storage scales linearly
Team channel bot / Support inbox	Session-First Memory	Keeps decisions and context visible to all participants in the thread	Medium: Shared storage requires indexing and access controls
Multi-platform deployment (Slack + Web)	Transport-Abstraction Layer	Prevents vendor lock-in and allows swapping adapters without rewriting agent logic	Low: One-time translation layer implementation
High-volume streaming UI	Stable Event Labels + Backpressure	Prevents dropped tokens, UI flickering, and memory leaks during long responses	Low: Standard event emitter pattern with buffering
External tool integration (Cursor/Claude Desktop)	Scoped MCP Tools	Maintains memory boundaries while exposing agent capabilities to third-party clients	Medium: Requires explicit scope validation per tool

Configuration Template

// agent.config.ts
import { AgentConfig, MemoryScope, StreamingConfig } from '@hot-dev/sdk';

export const personalAgentConfig: AgentConfig = {
  id: 'personal-assistant-prod',
  scope: MemoryScope.IDENTITY,
  model: 'claude-3-5-sonnet-20240620',
  streaming: {
    enabled: true,
    eventPrefix: 'personal-agent',
    backpressure: {
      bufferSize: 50,
      frameRate: 30
    }
  },
  memory: {
    retention: 'persistent',
    vectorDb: 'pinecone',
    index: 'user-notes-prod'
  },
  mcp: {
    enabled: true,
    scopeRestriction: 'identity-only',
    tools: ['search_personal_memory', 'update_preferences']
  },
  errorHandling: {
    isolation: true,
    retry: { maxAttempts: 3, backoff: 'exponential' }
  }
};

export const teamAgentConfig: AgentConfig = {
  id: 'workspace-bot-prod',
  scope: MemoryScope.SESSION,
  model: 'claude-3-5-sonnet-20240620',
  streaming: {
    enabled: true,
    eventPrefix: 'workspace-bot',
    backpressure: {
      bufferSize: 100,
      frameRate: 60
    }
  },
  memory: {
    retention: 'session-bound',
    vectorDb: 'pinecone',
    index: 'channel-decisions-prod'
  },
  mcp: {
    enabled: true,
    scopeRestriction: 'session-only',
    tools: ['search_channel_history', 'fetch_team_docs']
  },
  errorHandling: {
    isolation: true,
    retry: { maxAttempts: 5, backoff: 'exponential' }
  }
};

Quick Start Guide

Initialize the project: Install the SDK and agent harness (npm install @hot-dev/sdk hot-ai-agent). Configure your .env with ANTHROPIC_API_KEY and HOT_API_KEY.
Define agent scopes: Register two agents using the configuration template above, specifying MemoryScope.IDENTITY for personal assistants and MemoryScope.SESSION for team bots.
Implement transport normalization: Create a handler that converts incoming payloads into IncomingMessage shapes, extracting user_id, session_id, and command arguments.
Execute chat turns: Use executeChatTurn with the strict lifecycle sequence. Attach streaming event listeners for :reply:start, :reply:delta, and :reply:end.
Deploy and validate: Run the harness locally, switch between personal and team modes, verify memory isolation, and monitor agent graph visualization for command routing accuracy.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back