Difficulty

Intermediate

Read Time

8 min

Zep AI's Graphiti: Agent Memory Without Schema Is Just Storage

By Codcompass Team·2026-05-28·8 min read

Structured Agent Memory: Enforcing Ontologies in Temporal Knowledge Graphs

Current Situation Analysis

Autonomous agents require persistent, queryable memory to maintain context across sessions, tools, and multi-step workflows. The industry standard for building this memory has converged on a simple pattern: feed raw interaction logs or tool outputs into an LLM, ask it to extract entities and relationships, and store the results in a vector database or naive graph structure. This approach prioritizes retrieval speed over structural integrity, treating memory as a storage problem rather than a modeling problem.

The flaw is semantic drift. When an LLM is given unrestricted freedom to invent entity types and relationship labels, it defaults to high-abstraction, low-specificity outputs. Instead of AUTHORED_BY, SUPERSEDES, or DEPENDS_ON, you receive RELATED_TO. Instead of DeploymentConfig, SprintBacklogItem, or APIEndpoint, you receive Object or Concept. The resulting graph contains data, but lacks queryable structure. Engineers quickly discover that vector similarity search cannot compensate for a collapsed ontology. You cannot filter, traverse, or reason over a graph where every node is Thing and every edge is CONNECTED.

This problem is routinely overlooked because the AI engineering community has heavily optimized for embedding quality, chunking strategies, and retrieval-augmented generation (RAG) pipelines. Structural constraints are treated as optional metadata rather than foundational architecture. The consequence is a memory layer that grows linearly in size but decays quadratically in utility. Stale facts persist, contradictions go undetected, and downstream agents spend excessive tokens re-inferring context that should have been explicitly modeled.

The fix mirrors a pattern already proven across the broader AI stack: constrain the output space before generation, not after. By enforcing a strict ontology at extraction time, you transform agent memory from a generic storage bucket into a precise, queryable domain model.

WOW Moment: Key Findings

Constraining LLM extraction with a formal schema fundamentally changes how memory behaves in production. The following comparison illustrates the operational difference between unconstrained extraction and a schema-enforced temporal graph approach.

Approach	Query Precision	Relationship Granularity	Temporal Drift	Schema Maintenance Cost
Unconstrained LLM Extraction	Low (semantic overlap dominates)	Generic (`RELATES_TO`, `HAS_PROPERTY`)	High (stale edges persist indefinitely)	Low initially, high long-term (manual cleanup)
Schema-Constrained Temporal Graph	High (exact type/edge matching)	Domain-specific (`SUPERSEDES`, `DEPLOYED_TO`)	Low (automatic invalidation & history preservation)	Moderate upfront, near-zero long-term

Why this matters: Precision in agent memory directly correlates with token efficiency and decision accuracy. When relationships are explicitly typed and temporally bounded, agents can traverse memory instead of re-reading it. The schema-constrained approach eliminates the need for post-hoc filtering, reduces hallucination rates during context assembly, and enables deterministic queries like Find all configurations superseded after 2024-03-15 without relying on probabilistic embedding matches.

Core Solution

Graphiti, an open-source temporal knowledge graph library from Zep AI, operationalizes schema-constrained memory through three structural primitives: typed entities, constrained edges, and automatic temporal resolution

. The implementation follows a strict pipeline: define the ontology, initialize the graph engine, run constrained extraction, and let the system handle deduplication, contradiction detection, and time-windowing.

Step 1: Define the Ontology with Pydantic

Pydantic models replace freeform LLM generation with explicit domain vocabulary. Each model represents an allowed entity type, with typed fields and descriptive docstrings that guide extraction.

from pydantic import BaseModel, Field
from typing import Optional
from datetime import datetime

class ServiceEndpoint(BaseModel):
    """Represents a deployed API or internal service."""
    endpoint_id: str = Field(description="Unique identifier for the service")
    protocol: str = Field(description="Communication protocol, e.g., HTTPS, gRPC")
    region: str = Field(description="Deployment region or availability zone")
    status: str = Field(description="Current operational state: active, degraded, decommissioned")
    last_verified: Optional[datetime] = Field(description="Timestamp of last health check")

class DeploymentArtifact(BaseModel):
    """Represents a versioned release or infrastructure change."""
    artifact_id: str = Field(description="Unique release identifier")
    version_tag: str = Field(description="Semantic version or commit hash")
    target_service: str = Field(description="Service this artifact deploys to")
    rollout_strategy: str = Field(description="Deployment method: blue-green, canary, rolling")

Step 2: Configure Edge Constraints

Edges define valid relationships between entity types. Source and target constraints prevent the graph from forming semantically invalid connections.

from graphiti import EdgeConstraint, EntityType

# Define allowed relationship types
DEPLOYS_TO = EdgeConstraint(
    source=EntityType.DEPLOYMENT_ARTIFACT,
    target=EntityType.SERVICE_ENDPOINT,
    label="DEPLOYS_TO",
    description="Links a release artifact to its target service"
)

SUPERSEDES = EdgeConstraint(
    source=EntityType.DEPLOYMENT_ARTIFACT,
    target=EntityType.DEPLOYMENT_ARTIFACT,
    label="SUPERSEDES",
    description="Indicates a newer artifact replaces an older one"
)

Step 3: Initialize the Temporal Graph Engine

The engine consumes the ontology and edge constraints, then manages extraction, resolution, and temporal windowing.

from graphiti import TemporalGraphClient, ExtractionPipeline

# Initialize client with schema constraints
graph_client = TemporalGraphClient(
    ontology=[ServiceEndpoint, DeploymentArtifact],
    edge_constraints=[DEPLOYS_TO, SUPERSEDES],
    max_entity_types=10,
    max_edge_types=10,
    max_fields_per_type=10
)

# Configure extraction pipeline
pipeline = ExtractionPipeline(
    client=graph_client,
    llm_provider="openai",
    model="gpt-4o-mini",
    enable_temporal_resolution=True,
    contradiction_threshold=0.85
)

Step 4: Run Extraction & Resolution

Feed raw interaction logs or tool outputs into the pipeline. The system extracts entities, validates edges against constraints, detects contradictions, and applies temporal invalidation automatically.

async def ingest_agent_memory(raw_log: str, session_id: str):
    extraction_result = await pipeline.extract(
        text=raw_log,
        session_id=session_id,
        timestamp=datetime.utcnow()
    )
    
    # Resolution runs automatically downstream
    resolved_graph = await graph_client.resolve(
        extraction_result,
        time_window_hours=72
    )
    
    return resolved_graph

Architecture Decisions & Rationale

Why Pydantic? Pydantic provides runtime validation, type coercion, and structured docstrings. LLMs interpret field descriptions as extraction instructions, turning probabilistic generation into deterministic parsing.
Why constrain edges at definition time? Allowing arbitrary relationships creates traversal dead-ends. Explicit source/target constraints guarantee that every edge serves a queryable purpose.
Why the 10/10/10 limit? Token budgets and cognitive load dictate practical boundaries. Ten entity types, ten edge types, and ten fields per type force engineers to model the 80% that drives retrieval. Expanding beyond this threshold yields diminishing returns and increases extraction latency.
Why separate extraction from resolution? Extraction focuses on pattern matching against the schema. Resolution handles deduplication, contradiction detection, and temporal invalidation. Decoupling these stages prevents pipeline bottlenecks and allows independent scaling.

Pitfall Guide

1. Ontology Over-Engineering

Explanation: Defining 20+ entity types or deeply nested fields before validating retrieval needs. This inflates token costs, slows extraction, and creates maintenance debt. Fix: Start with 3-4 entity types and 2-3 edge types. Expand only when query precision drops below 85% or agents repeatedly misinterpret context.

2. Ignoring Temporal Boundaries

Explanation: Failing to configure time-windowing or contradiction thresholds. Stale edges persist, causing agents to act on deprecated configurations or outdated user preferences. Fix: Always enable temporal resolution with a defined time_window_hours parameter. Set contradiction thresholds based on domain volatility (e.g., 0.75 for fast-changing infra, 0.90 for stable user profiles).

3. Weak Field Descriptions

Explanation: Using vague docstrings like "Description of the item" instead of explicit extraction instructions. The LLM defaults to generic labels when guidance is ambiguous. Fix: Write field descriptions as extraction rules. Example: "Must be a valid ISO 8601 timestamp. Use 'null' if not provided."

4. Mixing Extraction and Resolution in a Single Step

Explanation: Attempting to validate, deduplicate, and extract simultaneously. This creates race conditions, increases latency, and obscures failure points. Fix: Maintain a strict pipeline: Extract -> Validate -> Resolve -> Index. Use async boundaries to isolate stages.

5. Assuming Vector Search Replaces Graph Traversal

Explanation: Relying on embedding similarity to find relationships that should be explicitly modeled. This works for semantic proximity but fails for exact state queries. Fix: Use graph traversal for structural queries (Find all superseded artifacts) and vector search only for semantic fallback (Find similar deployment patterns).

6. Hardcoding Unidirectional Edges Without Fallback

Explanation: Defining edges as strictly source→target without considering bidirectional queries. Agents may fail to retrieve context when traversing in reverse. Fix: Implement inverse edge resolution or configure the graph engine to auto-generate reverse traversal paths during indexing.

7. Skipping Contradiction Detection Configuration

Explanation: Allowing conflicting facts to coexist without resolution rules. The graph returns multiple states for the same entity, forcing agents to guess. Fix: Define explicit contradiction policies: latest_wins, manual_review, or confidence_threshold. Log all contradictions for audit trails.

Production Bundle

Action Checklist

Define ontology with 3-4 core entity types before scaling
Write explicit field descriptions as extraction rules, not generic labels
Configure edge constraints to prevent invalid relationship formation
Enable temporal resolution with a domain-appropriate time window
Set contradiction thresholds based on data volatility
Decouple extraction and resolution into distinct pipeline stages
Implement inverse traversal or bidirectional edge indexing
Log all schema violations and contradiction events for auditability

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
High-frequency state changes (e.g., infra deployments)	Schema-constrained temporal graph	Automatic invalidation prevents stale context	Moderate upfront, low long-term
Semantic similarity search (e.g., user intent matching)	Vector database with metadata filtering	Embeddings capture nuance better than rigid schemas	Low upfront, scales linearly
Strict compliance/audit requirements	Relational database with event sourcing	ACID guarantees and explicit versioning	High upfront, predictable scaling
Multi-agent collaborative memory	Graphiti with shared ontology	Consistent schema prevents cross-agent semantic drift	Moderate upfront, reduces coordination overhead

Configuration Template

# graphiti_config.py
from pydantic import BaseModel, Field
from graphiti import TemporalGraphClient, EdgeConstraint, EntityType

class UserProfile(BaseModel):
    """User identity and preference state."""
    user_id: str = Field(description="Primary identifier")
    role: str = Field(description="Access level: admin, editor, viewer")
    preference_locale: str = Field(description="ISO 639-1 language code")
    last_active: str = Field(description="ISO 8601 timestamp of last interaction")

class TaskRecord(BaseModel):
    """Work item or automation job."""
    task_id: str = Field(description="Unique job identifier")
    status: str = Field(description="pending, running, completed, failed")
    assigned_to: str = Field(description="User ID or system agent")
    priority: int = Field(description="1-5 scale, 5 being highest")

# Edge constraints
ASSIGNS_TASK = EdgeConstraint(
    source=EntityType.USER_PROFILE,
    target=EntityType.TASK_RECORD,
    label="ASSIGNS_TASK",
    description="Links a user to a work item they own"
)

UPDATES_STATUS = EdgeConstraint(
    source=EntityType.TASK_RECORD,
    target=EntityType.TASK_RECORD,
    label="UPDATES_STATUS",
    description="Chains task state transitions over time"
)

# Client initialization
graph_engine = TemporalGraphClient(
    ontology=[UserProfile, TaskRecord],
    edge_constraints=[ASSIGNS_TASK, UPDATES_STATUS],
    max_entity_types=10,
    max_edge_types=10,
    max_fields_per_type=10,
    temporal_resolution_enabled=True,
    default_time_window_hours=48
)

Quick Start Guide

Install dependencies: pip install graphiti pydantic
Define your ontology: Create 3-4 Pydantic models with explicit field descriptions.
Configure edge constraints: Map valid source/target pairs and assign semantic labels.
Initialize the client: Pass ontology and constraints to TemporalGraphClient with temporal resolution enabled.
Ingest data: Call pipeline.extract() with raw logs or agent outputs, then run graph_client.resolve() to apply deduplication and time-windowing.

Schema-constrained temporal graphs shift agent memory from probabilistic storage to deterministic modeling. By enforcing ontology boundaries at extraction time, you eliminate semantic drift, reduce token waste, and enable precise traversal. The 10/10/10 constraint is not a limitation; it is a design discipline that forces engineers to model what matters, not everything that exists.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back