Graphify + code-review-graph: Build a Self-Updating Knowledge Graph for Claude Code and other AI Coding Agent

By Codcompass Team·2026-05-17·6 min read

Token-Efficient AI Agents: Implementing Persistent Code Knowledge Graphs

Current Situation Analysis

The Context Drift Problem Modern AI coding agents operate within finite context windows, yet software repositories grow continuously. This mismatch creates a "Cold Start Tax" for every development session. When an agent begins a new task, it typically performs a broad file scan to reconstruct the codebase topology. In a medium-sized monorepo, this orientation phase can consume 20,000+ tokens before the agent writes a single line of solution code.

Why This Is Overlooked Teams often optimize for model capability rather than context efficiency. The assumption is that larger context windows solve the problem. However, simply increasing the window size inflates latency and cost without improving the agent's structural understanding of the code. The agent still lacks a persistent, queryable representation of relationships between modules, leading to repetitive re-indexing and hallucinated dependencies.

Data-Backed Evidence Analysis of real-world TypeScript monorepos reveals that AST-based graph construction requires zero LLM tokens and completes in sub-second to single-digit second ranges. By externalizing the code structure into a graph, agents can query specific relationships (e.g., "blast radius of this change") rather than ingesting raw file content. This shifts the workflow from reading to querying, drastically reducing token overhead per session.

WOW Moment: Key Findings

Comparing two open-source graph construction tools across identical codebases reveals distinct performance profiles. The data highlights a trade-off between update latency/edge density and community detection capabilities.

Metric	Graphify (AST-Only)	code-review-graph	Delta
Incremental Update	~10.0s (8 workers)	0.425s	23x Faster
Edge Density (Large Repo)	4,830 edges	30,611 edges	6.3x Denser
Storage Format	JSON (`graphify-out/`)	SQLite (`.code-review-graph/`)	SQLite enables FTS5
Semantic Search	No	Yes (Embeddings)	CRG supports vector query
Community Detection	Yes (Leiden Clustering)	No	Graphify provides clusters
LLM Token Cost	0 tokens	0 tokens	Both are free to build
Confidence Tags	EXTRACTED / INFERRED	N/A	Graphify provides edge metadata

Why This Matters The code-review-graph tool offers superior update speed and edge density, making it ideal for real-time "blast radius" analysis during active development. Graphify provides community detection and Obsidian-compatible reports, which are valuable for architectural documentation and team onboarding. Using both allows agents to query the fast, dense graph for immediate impact analysis and fall back to the com

munity graph for high-level structural exploration.

Core Solution

This section outlines a production-grade implementation for integrating persistent code graphs with AI agents. The architecture supports standalone usage or a hybrid approach where code-review-graph serves as the primary query engine and Graphify acts as a fallback for community insights.

Phase 1: Noise Reduction

Graph construction must exclude generated artifacts, dependencies, and lock files to prevent index bloat and recursive loops. Create dedicated ignore files at the project root.

.graphifyignore

node_modules/
dist/
build/
.pnpm-store/
coverage/
*.min.js
*.min.css
*.map
pnpm-lock.yaml
yarn.lock
*.lock
*.log
.env*
graphify-out/
.code-review-graph/
*.example.*

.code-review-graphignore

node_modules/
dist/
build/
.pnpm-store/
coverage/
*.min.js
*.min.css
*.map
pnpm-lock.yaml
yarn.lock
*.lock
*.log
.env*
graphify-out/
.code-review-graph/
*.example.*

Phase 2: Graph Construction

Initialize the graphs using CLI commands. For Graphify, the AST-only mode requires no API keys. For code-review-graph, the build process creates a SQLite database with FTS5 indexing.

Build Script (scripts/build-graphs.sh)

#!/usr/bin/env bash
set -euo pipefail

PROJECT_ROOT="$(git rev-parse --show-toplevel)"
cd "$PROJECT_ROOT"

echo "🔨 Building code-review-graph..."
if command -v code-review-graph >/dev/null 2>&1; then
  code-review-graph build
  echo "✅ code-review-graph complete."
else
  echo "⚠️  code-review-graph not found. Skipping."
fi

echo "🔨 Building Graphify..."
if command -v graphify >/dev/null 2>&1; then
  graphify update .
  echo "✅ Graphify complete."
else
  echo "⚠️  Graphify not found. Skipping."
fi

Phase 3: Agent Integration via MCP

Expose the graphs to AI agents using the Model Context Protocol (MCP). This allows agents to invoke tools like semantic_search_nodes or get_impact_radius directly.

MCP Configuration (mcp-config.json)

{
  "mcpServers": {
    "code-review-graph": {
      "command": "code-review-graph",
      "args": ["mcp-server"],
      "env": {
        "CRG_TOOLS": "semantic_search_nodes,query_graph,get_impact_radius,list_communities,get_review_context"
      },
      "disabled": false
    }
  }
}

Note: The CRG_TOOLS environment variable allow-lists specific tools, reducing the context footprint of the tool definitions sent to the agent.

Phase 4: Automated Lifecycle Hooks

To keep the graph synchronized with code changes, implement hooks that trigger incremental updates. Use a PID file to prevent race conditions when multiple updates fire simultaneously.

Post-Tool Hook (hooks/post-tool-update.sh)

#!/usr/bin/env bash
# Triggered after agent tool usage to refresh graph state.
# Uses PID lock to prevent concurrent updates.

LOCK_FILE="/tmp/crg-agent-update.pid"

if command -v code-review-graph >/dev/null 2>&1 && [ -d ".code-review-graph" ]; then
  if [ -f "$LOCK_FILE" ] && kill -0 "$(cat "$LOCK_FILE")" 2>/dev/null; then
    exit 0
  fi

  echo $$ > "$LOCK_FILE"
  code-review-graph update --skip-flows 2>/dev/null &
  UPDATE_PID=$!
  echo $UPDATE_PID > "$LOCK_FILE"
  
  # Clean up lock after background process finishes
  wait $UPDATE_PID
  rm -f "$LOCK_FILE"
fi

Agent Settings Integration Configure the agent to execute the hook on session start and after tool usage.

{
  "hooks": {
    "SessionStart": [
      {
        "hooks": [{
          "type": "command",
          "command": "code-review-graph status 2>/dev/null || true",
          "timeout": 10
        }]
      }
    ],
    "PostToolUse": [
      {
        "hooks": [{
          "type": "command",
          "command": "bash hooks/post-tool-update.sh",
          "timeout": 15
        }]
      }
    ]
  }
}

Pitfall Guide

PyPI Naming Quirk
- Mistake: Running pip install graphify.
- Explanation: The package name on PyPI is graphifyy (two y's), but the CLI command is graphify (one y).
- Fix: Install via pip install graphifyy. Verify with graphify --help.
Recursive Indexing Loops
- Mistake: Forgetting to exclude output directories in ignore files.
- Explanation: If graphify-out/ or .code-review-graph/ are not ignored, the tools will index their own output, causing exponential growth and infinite loops.
- Fix: Ensure graphify-out/ and .code-review-graph/ are listed in both .graphifyignore and .code-review-graphignore.
Hook Race Conditions
- Mistake: Multiple hooks triggering updates simultaneously.
- Explanation: Rapid agent actions can spawn multiple update processes, corrupting the SQLite database or wasting resources.
- Fix: Implement PID file locking as shown in the post-tool hook example.
LLM Cost Surprise
- Mistake: Running graphify extract without budgeting.
- Explanation: The extract command uses LLM subagents to analyze PDFs, images, and markdown, incurring API costs.
- Fix: Use graphify update for AST-only mode (zero cost). Reserve extract for specific documentation needs and monitor API usage.
Config Sprawl
- Mistake: Committing all generated config files.
- Explanation: code-review-graph install writes configs for multiple IDEs and platforms, cluttering the repository.
- Fix: Add generated files like .cursorrules, .opencode.json, and AGENTS.md to .gitignore. Commit only the essential .mcp.json or use a selective installation approach.
Stale Graphs on Git Operations
- Mistake: Relying solely on agent hooks for updates.
- Explanation: Human edits via git commits may not trigger agent hooks, leaving the graph stale.
- Fix: Install a git pre-commit hook via code-review-graph install to ensure updates occur on every commit.
Over-Indexing Assets
- Mistake: Including minified files and source maps.
- Explanation: These files add noise to the graph without providing meaningful semantic relationships.
- Fix: Exclude *.min.js, *.min.css, and *.map in ignore files.

Production Bundle

Action Checklist

Install tools: pip install graphifyy code-review-graph (or use uv/pipx).
Create .graphifyignore and .code-review-graphignore with noise patterns.
Run initial build: code-review-graph build and graphify update ..
Configure MCP server with mcp-config.json and tool allow-listing.
Implement PID-locked hooks for incremental updates.
Add output directories to .gitignore.
Verify agent integration by querying graph tools in a session.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Real-time PR Review	`code-review-graph`	Sub-second updates and blast-radius analysis enable immediate impact assessment.	Zero LLM cost.
Team Onboarding	`Graphify`	Community detection and Obsidian reports provide high-level architectural context.	Zero LLM cost (AST mode).
Semantic Search	`code-review-graph`	Embedding-based search allows natural language queries over the codebase.	Zero LLM cost.
Documentation Extraction	`Graphify` (Extract)	LLM subagents can parse PDFs and markdown for semantic relationships.	Incurs API cost.
Hybrid Workflow	Both	Use `code-review-graph` for speed/density and `Graphify` for communities/fallback.	Zero LLM cost (AST mode).

Configuration Template

mcp-config.json

{
  "mcpServers": {
    "code-review-graph": {
      "command": "code-review-graph",
      "args": ["mcp-server"],
      "env": {
        "CRG_TOOLS": "semantic_search_nodes,query_graph,get_impact_radius,list_communities,get_review_context"
      },
      "disabled": false
    }
  }
}

.gitignore Additions

# Graph outputs
graphify-out/
.code-review-graph/

# Generated agent configs (keep only essential ones)
.cursorrules
.opencode.json
AGENTS.md
GEMINI.md
.windsurfrules

Quick Start Guide

Install: Run uv tool install graphifyy code-review-graph.
Ignore: Create .graphifyignore and .code-review-graphignore with standard noise patterns.
Build: Execute code-review-graph build and graphify update ..
Connect: Add mcp-config.json to your project and configure agent hooks.
Verify: Open an AI coding session and query the graph using tools like semantic_search_nodes.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back