e official TypeScript SDK, we define the tool with explicit input validation and structured output.
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
const server = new McpServer({
name: "pre-commit-validator",
version: "1.0.0"
});
server.tool(
"validate_uncommitted_changes",
"Runs Codex review against working tree changes and returns structured findings.",
{
diff_path: z.string().describe("Path to the git diff file or working tree root"),
max_iterations: z.number().int().min(1).max(5).default(3).describe("Maximum review-fix cycles"),
strict_mode: z.boolean().default(false).describe("Fail on any warning or error")
},
async ({ diff_path, max_iterations, strict_mode }) => {
// Implementation follows in Step 2
return { content: [{ type: "text", text: "Validation initiated" }] };
}
);
const transport = new StdioTransport();
await server.connect(transport);
Architecture Rationale: We isolate the review tool to a single endpoint to minimize surface area. The max_iterations parameter prevents infinite loops, while strict_mode allows teams to tune sensitivity based on environment (dev vs staging). Using Zod ensures runtime validation before spawning external processes.
Step 2: Implement Background Process Isolation
The core value of this loop is contextual separation. The reviewing instance must not share memory, prompt history, or state with the writing agent. We achieve this by spawning a child process that runs the Codex CLI in review mode, capturing stdout/stderr, and parsing the output.
import { spawn } from "child_process";
import { readFileSync } from "fs";
import { join } from "path";
interface ReviewFinding {
file: string;
line: number;
severity: "error" | "warning" | "info";
message: string;
suggestion?: string;
}
async function spawnReviewProcess(diffContent: string): Promise<ReviewFinding[]> {
return new Promise((resolve, reject) => {
const codex = spawn("codex", ["review", "--format", "json", "--stdin"], {
stdio: ["pipe", "pipe", "pipe"]
});
let stdout = "";
let stderr = "";
codex.stdout.on("data", (chunk) => (stdout += chunk.toString()));
codex.stderr.on("data", (chunk) => (stderr += chunk.toString()));
codex.on("close", (code) => {
if (code !== 0) {
reject(new Error(`Codex exited with code ${code}: ${stderr}`));
return;
}
try {
const parsed = JSON.parse(stdout);
resolve(parsed.findings || []);
} catch (err) {
reject(new Error("Failed to parse Codex review output"));
}
});
codex.stdin.write(diffContent);
codex.stdin.end();
});
}
Architecture Rationale: Spawning a separate process guarantees memory isolation and prevents prompt injection or context leakage between the writer and reviewer. Using --stdin avoids temporary file race conditions. The JSON output format enables deterministic parsing and structured error reporting.
Step 3: Build the Iteration Controller
The loop must handle findings, apply fixes, and re-validate until clean or until the iteration limit is reached. This controller lives in the agent orchestration layer, not the MCP server itself, to maintain separation of concerns.
async function runReviewLoop(
diffExtractor: () => string,
fixer: (findings: ReviewFinding[]) => Promise<void>,
options: { maxIterations: number; strict: boolean }
): Promise<{ clean: boolean; findings: ReviewFinding[]; cycles: number }> {
let cycle = 0;
let currentFindings: ReviewFinding[] = [];
while (cycle < options.maxIterations) {
const diff = diffExtractor();
currentFindings = await spawnReviewProcess(diff);
const critical = currentFindings.filter(f => f.severity === "error");
if (critical.length === 0) break;
await fixer(currentFindings);
cycle++;
}
return {
clean: currentFindings.filter(f => f.severity === "error").length === 0,
findings: currentFindings,
cycles: cycle
};
}
Architecture Rationale: The controller delegates diff extraction and fixing to external functions, making it framework-agnostic. It breaks early on clean passes, minimizing unnecessary model calls. The strict flag can be wired to reject commits if warnings exceed a threshold.
Step 4: Integrate with Agentic Client
In an MCP-capable editor or agent runtime, the tool is invoked automatically after file modifications. The client receives structured findings and can either apply automated patches or surface them to the developer.
// Agent orchestration hook (example)
agent.on("filesModified", async (changedFiles) => {
const diff = await git.diff({ files: changedFiles, uncommitted: true });
const result = await runReviewLoop(
() => diff,
async (findings) => {
for (const f of findings) {
if (f.suggestion) {
await patcher.apply(changedFiles.find(file => file.includes(f.file)), f);
}
}
},
{ maxIterations: 3, strict: false }
);
if (!result.clean) {
logger.warn(`Review loop exhausted after ${result.cycles} cycles. Manual review required.`);
await agent.humanHandoff(result.findings);
}
});
Architecture Rationale: Hooking into filesModified ensures validation runs at logical boundaries, not on every keystroke. The fallback to humanHandoff preserves safety nets for complex or ambiguous findings. This pattern scales across Cursor, VS Code with MCP extensions, or custom agent runtimes.
Pitfall Guide
1. Continuous Save Triggers
Explanation: Wiring the review loop to every file save causes excessive model calls, inflating costs and slowing the editor. Half-finished code triggers false positives that frustrate developers.
Fix: Trigger validation on explicit agent signals, logical commit boundaries, or after a debounce window (e.g., 30 seconds of inactivity). Use git diff --stat to skip trivial changes.
2. Model Homogeneity Blind Spots
Explanation: When the same model family writes and reviews code, shared training biases can miss subtle architectural flaws, incorrect locking patterns, or off-by-one errors. Context separation reduces author bias but not reasoning bias.
Fix: Rotate review models (e.g., GPT-5.5 for writing, Claude 4 or Gemini 2.5 for review) or supplement with deterministic linters (ESLint, Semgrep, OCLint). Never rely solely on AI for security-critical paths.
3. Context Window Overflow
Explanation: Large diffs exceed token limits, causing truncation or degraded review quality. The model may ignore files at the end of the payload or hallucinate line numbers.
Fix: Chunk diffs by file or module. Pass only changed hunks using git diff -U3. Implement a size threshold (e.g., skip review if diff > 500 lines) and route to CI instead.
4. Cost Spirals from Unbounded Loops
Explanation: Without iteration limits, the agent may cycle through fixes and reviews indefinitely, especially when the model struggles with complex refactors. Each pass consumes tokens and API quota.
Fix: Enforce max_iterations (typically 2β3). Log cycle counts and alert when thresholds are hit. Implement exponential backoff or fallback to human review after exhaustion.
5. False Security on Security and Architecture
Explanation: AI reviewers excel at mechanical validation but lack systemic understanding. They rarely catch race conditions, privilege escalation paths, or design anti-patterns that require domain knowledge.
Fix: Treat AI output as a first-pass filter. Implement hard gates for security-sensitive directories (/auth, /crypto, /payments). Require human sign-off for public API changes or database schema modifications.
6. MCP Transport Misconfiguration
Explanation: Mixing stdio and SSE transports between server and client causes silent failures or connection timeouts. Debugging transport mismatches wastes significant time.
Fix: Explicitly declare transport type in both server and client configs. Use stdio for local editor integration and SSE for remote agent orchestration. Validate with mcp-cli ping before deployment.
7. Ignoring Process Exit Codes and Stderr
Explanation: Assuming success when Codex fails to parse input or encounters network issues leads to silent validation skips. Unhandled stderr streams can also leak sensitive data.
Fix: Always check exitCode. Parse stderr separately and sanitize logs. Implement retry logic with circuit breakers for transient failures. Never swallow process errors.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Solo AI prototyping | Pre-commit MCP loop (max 3 cycles) | Fast feedback, low friction, catches mechanical errors early | Low-Medium |
| Team PR workflow | CI pipeline + human review | Ensures consistency, enforces standards, scales across contributors | Medium-High |
| Security-critical module | Human review + static analysis | AI lacks threat modeling depth; deterministic tools catch known CVEs | High (justified) |
| High-frequency iteration | MCP loop + debounce + chunking | Prevents cost blowouts while maintaining rapid validation | Low |
| Legacy codebase refactor | CI + manual sign-off | Context drift and architectural debt require human oversight | High |
Configuration Template
{
"mcp": {
"servers": {
"pre-commit-validator": {
"command": "node",
"args": ["./dist/server.js"],
"env": {
"CODEX_API_KEY": "${CODEX_API_KEY}",
"CODEX_MODEL": "gpt-5.5",
"MAX_REVIEW_CYCLES": "3",
"DIFF_SIZE_LIMIT": "500",
"STRICT_MODE": "false"
},
"transport": "stdio",
"timeout": 15000
}
}
},
"validation": {
"skip_patterns": ["*.md", "*.json", "test/fixtures/**"],
"critical_paths": ["src/auth/**", "src/crypto/**", "api/public/**"],
"fallback": "human_review"
}
}
Quick Start Guide
- Initialize the MCP server: Run
npm init @modelcontextprotocol/server pre-commit-validator and install dependencies. Replace the default tool with the validate_uncommitted_changes schema from Step 1.
- Configure environment variables: Export
CODEX_API_KEY and set CODEX_MODEL to your preferred review model. Adjust MAX_REVIEW_CYCLES and DIFF_SIZE_LIMIT based on your budget.
- Connect to your editor: In Cursor or VS Code, add the server configuration to your MCP settings file. Restart the editor to register the tool.
- Test the loop: Modify a file, trigger the agent's save hook, and verify that
validate_uncommitted_changes returns structured findings. Check logs for cycle counts and cost metrics.
- Enforce fallback gates: Add path-based rules to skip AI review for security directories and route them to human review. Monitor the first week of usage to tune thresholds.