nce, implementations). Unlike regex-based grep, AST parsing guarantees syntactic accuracy, eliminating false positives from string literals, comments, or dynamically generated paths.
2. Local Graph Storage
Extracted symbols and edges are persisted in a local SQLite database. SQLite is chosen for its ACID compliance, zero-configuration deployment, and native FTS5 full-text search extension. FTS5 enables rapid symbol lookup, fuzzy matching, and cross-referencing without external dependencies. The database schema maps directly to code topology: nodes store symbol metadata (name, type, file path, line range), while edges encode directional relationships (calls, imports, extends).
3. Reference Linking & Dependency Tracing
After initial ingestion, a resolution pass links cross-references. Function calls are mapped to their definitions, import statements are resolved to source files, and class hierarchies are flattened into inheritance chains. This step transforms raw AST data into a navigable graph. The agent no longer needs to guess where a symbol lives; the graph provides direct pointers and transitive dependency paths.
4. Auto-Sync & Incremental Updates
Codebases evolve. A native OS file watcher monitors source directories for changes. When files are modified, added, or deleted, the watcher triggers an incremental re-parse. Changes are debounced using a short quiet window to prevent index thrashing during rapid edits. The graph updates in-place, maintaining consistency without full re-indexing. No manual synchronization or configuration is required.
Architecture Rationale
- Local Execution: All processing occurs on the developer machine. No API keys, no network latency, no data exfiltration. This aligns with security policies and eliminates third-party rate limits.
- MCP Integration: The graph exposes tools via the Model Context Protocol (MCP). This standardizes tool calling across agents, replacing ad-hoc shell commands with structured JSON-RPC interfaces.
- FTS5 Over Raw SQL: Full-text search handles symbol aliases, partial matches, and fuzzy queries efficiently. Raw SQL would require complex joins and regex workarounds for equivalent functionality.
- Debounced Watchers: Native OS events (inotify, FSEvents, ReadDirectoryChangesW) are batched to avoid redundant parsing during save storms or IDE formatting passes.
Agent Orchestration Example (TypeScript)
The following example demonstrates how an agent router replaces blind traversal with graph queries. Notice the structural shift: instead of spawning multiple shell processes, the agent issues a single structured lookup and receives resolved entry points.
import { MCPClient } from '@anthropic/mcp-client';
interface GraphQuery {
symbol: string;
relation: 'calls' | 'imports' | 'extends' | 'implements';
depth: number;
}
interface GraphResponse {
entryPoints: Array<{ path: string; line: number; type: string }>;
relatedSymbols: Array<{ name: string; relation: string; path: string }>;
contextSnippets: Array<{ file: string; content: string }>;
}
class AgentRouter {
private mcp: MCPClient;
constructor(mcpEndpoint: string) {
this.mcp = new MCPClient(mcpEndpoint);
}
/**
* Replaces sequential grep/glob/read loops with a single graph lookup.
*/
async resolveSymbolArchitecture(query: GraphQuery): Promise<GraphResponse> {
// Blind approach (removed):
// const files = await glob('**/*.{ts,js,tsx}');
// const matches = await Promise.all(files.map(f => grep(f, query.symbol)));
// const contexts = await Promise.all(matches.map(m => readFile(m.path)));
// Graph-indexed approach:
const toolCall = {
method: 'codegraph.resolve',
params: {
symbol: query.symbol,
relation: query.relation,
maxDepth: query.depth
}
};
const rawResult = await this.mcp.callTool(toolCall);
return {
entryPoints: rawResult.nodes.map(n => ({
path: n.file_path,
line: n.start_line,
type: n.symbol_type
})),
relatedSymbols: rawResult.edges.map(e => ({
name: e.target_symbol,
relation: e.edge_type,
path: e.target_file
})),
contextSnippets: rawResult.snippets.map(s => ({
file: s.file,
content: s.text
}))
};
}
}
This router eliminates iterative I/O. The agent receives exact file paths, line numbers, and surrounding context in one response. Downstream reasoning steps operate on verified entry points instead of probabilistic search results.
Pitfall Guide
1. Index Thrashing from Unbounded File Watchers
Explanation: Native OS watchers trigger on every filesystem event. IDEs, linters, and build tools generate rapid save cycles. Without debouncing, the graph re-parses continuously, consuming CPU and locking the database.
Fix: Implement a quiet-window debounce (typically 300-500ms). Batch events and trigger a single incremental parse after the write storm settles. Monitor watcher queue depth in production.
2. Over-Indexing Third-Party Dependencies
Explanation: Including node_modules, vendor/, or target/ directories bloats the SQLite database, slows query resolution, and introduces noise from external APIs that rarely change.
Fix: Configure exclusion patterns at initialization. Index only source directories (src/, lib/, app/). Third-party symbols should be resolved via type definitions or package manifests, not full AST ingestion.
3. Treating the Graph as a Context Window Replacement
Explanation: The graph returns entry points and structural relationships, not full file contents. Agents that expect complete source code from graph queries will fail or hallucinate missing logic.
Fix: Use the graph for routing and discovery. Follow up with targeted Read operations only on resolved entry points. Maintain a two-phase workflow: graph lookup β selective file read β reasoning.
4. CI Pipeline Index Rebuilds
Explanation: Running graph initialization on every CI run wastes time and breaks caching strategies. The index is meant for local development, not ephemeral build environments.
Fix: Cache the .codegraph/ directory in CI using hash-based keys. Only rebuild if source files change. Alternatively, disable graph tools in CI and rely on static analysis or pre-built type graphs.
5. Monorepo Boundary Violations
Explanation: Cross-package imports often use workspace aliases or relative paths that break when indexed in isolation. The graph may fail to resolve edges between packages.
Fix: Initialize the graph at the monorepo root. Configure workspace resolution rules to map aliases to physical paths. Verify cross-package edges after initialization using a dependency audit command.
Explanation: Auto-allowing all graph tools without scope restrictions can expose sensitive paths or allow unintended file system traversal.
Fix: Restrict MCP tool permissions to read-only graph queries. Disable write operations unless explicitly required for index maintenance. Audit tool schemas periodically for privilege escalation risks.
7. Assuming Graph Accuracy Equals Semantic Understanding
Explanation: AST parsing captures syntax, not intent. A function call edge does not guarantee runtime execution. Dynamic imports, reflection, and conditional routing break static graphs.
Fix: Treat the graph as a structural baseline, not a runtime trace. Combine with runtime instrumentation or test coverage data for execution-critical workflows. Document known limitations in team runbooks.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Small repo (<50 files) | Blind traversal or lightweight grep | Index overhead outweighs benefits; agent discovers quickly | Low token savings, higher latency negligible |
| Large monorepo (>500 files) | Graph-indexed routing | Discovery tax compounds exponentially; graph reduces calls by 90%+ | High token savings, 70%+ latency reduction |
| CI/CD pipeline | Cached graph or disabled tools | Ephemeral runners lack persistent state; rebuilds waste time | Zero rebuild cost, stable cache hits |
| Security-sensitive environment | Local SQLite + read-only MCP | No data exfiltration, deterministic execution, audit-friendly | Compliance cost offset, zero network risk |
| Dynamic/reflection-heavy codebase | Graph + runtime tracing | Static AST misses dynamic edges; combine for coverage | Moderate setup cost, high accuracy gain |
Configuration Template
Copy this into your project root to standardize graph initialization and MCP tool routing. Adjust paths and exclusion patterns to match your stack.
{
"codegraph": {
"root": ".",
"sourceDirs": ["src", "lib", "app", "packages"],
"excludePatterns": [
"**/node_modules/**",
"**/dist/**",
"**/build/**",
"**/*.test.*",
"**/*.spec.*"
],
"watcher": {
"enabled": true,
"debounceMs": 400,
"maxQueueSize": 50
},
"mcp": {
"server": "codegraph-mcp",
"tools": ["resolve", "list_symbols", "trace_deps"],
"permissions": ["read_only"]
},
"storage": {
"engine": "sqlite",
"fts5": true,
"path": ".codegraph/index.db"
}
}
}
Quick Start Guide
- Initialize the index: Navigate to your project root and run the setup command. This generates the
.codegraph/ directory, configures the MCP server, and applies default permissions.
- Verify exclusion patterns: Open the generated config and confirm that build artifacts, third-party dependencies, and test files are excluded. Adjust
sourceDirs if your project uses non-standard layouts.
- Restart the agent: Reload your coding agent or IDE extension. The MCP server will register graph tools automatically. No manual tool registration is required.
- Run a discovery query: Ask the agent to trace a symbol, map imports, or locate a function definition. Observe the reduction in tool calls and latency compared to previous blind traversal attempts.
- Cache for CI: Add
.codegraph/ to your pipeline cache strategy. Use a hash of source files as the cache key to ensure incremental updates without full rebuilds.