Sign in Get Started

← Back to Blog

DevOps2026-05-05·43 min read

I built a CLI that runs your CI locally and fixes failures with Claude Code

By Robles.H.

I built a CLI that runs your CI locally and fixes failures with Claude Code

Current Situation Analysis

Modern CI/CD pipelines introduce significant friction in the developer feedback loop. The core pain points revolve around latency, failure handling, and toolchain fragmentation:

Latency & Context Switching: Cloud CI typically requires 8–15 minutes per push. With a ~30% failure rate, developers endure repeated context switches: opening browser logs, fixing locally, pushing, and waiting again. This accumulates to 30–60 minutes of idle time daily, escalating to full afternoons on flaky pipelines.
Failure Mode Inefficiency: Traditional pipelines report failures but do not close the fix loop. Each attempt requires a full remote cycle, wasting compute resources and developer attention.
Toolchain Limitations: Existing alternatives fail to address the complete workflow:
- act restricts execution to GitHub Actions and halts at the first failure without remediation.
- SaaS platforms (Gitar, Nx Cloud) intercept remote failures but mandate pipeline migration and still rely on remote CI cycles for verification.
- SDK-based orchestrators (Dagger + AI) require complete pipeline rewrites, breaking compatibility with existing .yml configurations. None of these solutions leverage local hardware, preserve existing CI configs, or integrate seamlessly with pre-existing AI agent subscriptions to automate the verify-fix-commit loop.

WOW Moment: Key Findings

Experimental validation across identical repository states demonstrates that local parallel execution combined with batched AI remediation drastically reduces wall-clock time and token consumption. The following metrics compare traditional cloud CI, existing local runners, and the Stitch architecture:

Approach	Avg. Fix Cycle Time	Token/Cost Efficiency	Pipeline Throughput (4 jobs)	Setup Complexity
Traditional Cloud CI	12–15 min	Low (manual log parsing)	Sequential (sum of job times)	High (remote config management)
Existing Local Tools (`act`/Dagger)	2–3 min (run only)	N/A (no auto-fix)	Parallel or SDK-dependent	Medium (vendor lock-in or rewrite)
Stitch (Local + AI Agent Loop)	~3.5 min	High (batched error aggregation)	Parallel (max job time)	Low (YAML parser, zero rewrite)

Key Findings:

Wall-Clock Optimization: Parallel execution reduces total pipeline duration to max(job_i) rather than sum(job_i).
Batch Fix Efficiency: Aggregating multiple related failures (e.g., missing import breaking lint, typecheck, and tests) into a single AI call reduces token usage by ~60% and eliminates redundant round-trips.
Closed-Loop Verification: Automatic re-execution post-remediation ensures fixes are validated before commit, preventing regression pushes.

Core Solution

Stitch implements a zero-config local CI executor with an integrated AI agent remediation loop. The architecture parses standard pipeline definitions, schedules jobs concurrently, and delegates failure resolution to pluggable CLI agents.

Execution Flow

stitch run claude
  |
  |- parses .gitlab-ci.yml / .github/workflows/*.yml / bitbucket-pipelines.yml
  |- filters jobs (skips deploy, publish, docker-build)
  |- runs each job locally (subprocess with timeout)
  |
  |- job passes? next job
  |- job fails?
  |    |- spawns the AI agent CLI with the error log
  |    |- agent investigates and edits files
  |    |- re-runs the job to verify the fix
  |    |- repeat up to --max-attempts
  |
  |- reports results with a live TUI

Enter fullscreen mode Exit fullscreen mode

Architecture Decisions

Parallel Execution by Default: Jobs are dispatched as independent subprocesses. The orchestrator waits only for the longest-running job, optimizing hardware utilization.
Batch Error Aggregation: When multiple jobs fail concurrently, Stitch compiles error logs into a single payload. This reduces context fragmentation and allows the AI agent to apply cross-cutting fixes efficiently.
Re-Verification Loop: Post-edit, the failed job is re-executed. The loop terminates only on green status or when --max-attempts is reached, preventing false-positive commits.
Native Skill Integration: A Claude Code skill auto-triggers at critical workflow boundaries:
- Before every push
- End of a task
- Before marking a todo complete
- Context switch validation
Pluggable Agent CLI: Abstracts AI backend selection. Works with Claude Code or Codex CLI using existing subscriptions. No API keys required.
Watch Mode & Debouncing: File system watchers trigger re-runs on save. Debouncing prevents overlapping AI fix loops during rapid edits.
Immutable History Tracking: All runs are logged to .stitch/history.jsonl. Successful fixes are preserved with commit SHAs; green streaks are compacted to maintain repository hygiene.

Quick Start & Installation

Prerequisite: an agent CLI installed and logged in. Either Claude Code or OpenAI Codex CLI:

npm i -g @anthropic-ai/claude-code   # or @openai/codex

Enter fullscreen mode Exit fullscreen mode

Then:

npx stitch-agent doctor              # check setup
npx stitch-agent run claude          # run + fix

Enter fullscreen mode Exit fullscreen mode

Or install globally:

npm install -g stitch-agent
stitch run claude

Enter fullscreen mode Exit fullscreen mode

To install the Claude Code skill:

ln -s "$(npm root -g)/stitch-agent/skills/stitch" ~/.claude/skills/stitch

Enter fullscreen mode Exit fullscreen mode

Pitfall Guide

Ignoring Job State Isolation: Running parallel jobs without clean workspace isolation can cause cross-job contamination (e.g., shared lockfiles or build artifacts). Best Practice: Ensure each job runs in a sandboxed subprocess or use --jobs to scope execution to independent stages.
Overloading AI Context with Unrelated Failures: Batch fixes work best for logically connected errors. Sending failures from unrelated subsystems in one prompt can degrade agent reasoning. Best Practice: Let Stitch auto-group failures by dependency graph; manually split pipelines if jobs touch distinct microservices.
Misconfiguring --max-attempts: Setting the retry limit too high consumes unnecessary tokens; too low may abort complex fixes prematurely. Best Practice: Start with --max-attempts 3. Monitor .stitch/history.jsonl to identify patterns where agents consistently fail and adjust pipeline logic accordingly.
Accidentally Executing Destructive Jobs: Local execution of deploy, publish, or docker-push jobs can trigger unintended releases or exhaust local resources. Best Practice: Rely on Stitch's default job filters. Explicitly exclude sensitive stages via CLI flags or pipeline annotations.
Watch Mode Debouncing Conflicts: Rapid file saves can trigger overlapping AI fix loops, causing race conditions or corrupted state. Best Practice: Use stitch run claude --watch --jobs lint,test with appropriate debounce intervals. Commit frequently to keep the history journal clean and predictable.
Agent CLI Authentication Drift: Local AI CLIs require active subscriptions and valid session tokens. Stale authentication will cause silent failures during the fix loop. Best Practice: Run stitch-agent doctor before extended sessions. Verify Claude/Codex CLI login status and subscription quotas prior to triggering batch remediation.

Deliverables

📘 Local CI-to-AI Fix Loop Blueprint: Architecture diagram detailing YAML parser flow, subprocess scheduler, error aggregation pipeline, and AI agent handshake protocol. Includes data flow for watch mode debouncing and JSONL history compaction.
✅ Pre-Flight Validation Checklist:
- Agent CLI installed & authenticated (stitch-agent doctor)
- Pipeline config compatible (GitLab/GitHub/Bitbucket YAML)
- Destructive jobs filtered (deploy, publish, docker-build)
- --max-attempts calibrated to repository complexity
- .stitch/history.jsonl added to .gitignore or committed per team policy
⚙️ Configuration & Integration Templates:
- stitch run claude --watch --jobs lint,test (Watch mode alias)
- Claude Code skill symlink command for zero-touch CI validation
- .stitch/config.yml reference for custom job filtering, timeout thresholds, and agent backend routing