Building a linter for the bugs AI agents actually make
Deterministic Triage for AI-Generated Build Failures
Current Situation Analysis
The integration of AI coding assistants into development workflows has fundamentally shifted the failure surface of modern codebases. Where human developers typically introduce logical flaws, race conditions, or architectural debt, AI agents predominantly generate syntactically valid code that fails at the compilation boundary. The compiler rejects it, but the error output is structurally identical to a missing import or a typo. Engineering teams spend disproportionate time triaging these failures because traditional static analysis tooling is blind to them.
This blind spot exists because conventional linters operate on Abstract Syntax Trees (ASTs) or control-flow graphs. They assume the code compiles. Tools like staticcheck or golangci-lint excel at catching nil pointer dereferences, unused variables, and inefficient loops, but they never execute the type-checker. When an AI agent hallucinates a method signature, references a deprecated constant, or passes the wrong number of arguments to a standard library function, the compiler catches it first. By the time the linter runs, the build has already aborted. The error is logged, but it's never classified, tracked, or routed to the right developer.
Industry telemetry confirms this shift. Surveys across engineering organizations consistently show that developers now spend more time debugging AI-authored code than code they wrote themselves. The failures are not stochastic noise. They cluster into four distinct failure modes:
- Undefined symbols: References to packages or variables that never existed in the target version.
- Undefined methods: Calls to methods that belong to a different type or were removed in a recent SDK update.
- Arity mismatches: Correct method name, incorrect argument count.
- Type mismatches: Correct signature structure, wrong parameter types or return handling.
These are not edge cases. They are the default failure mode of LLM-assisted development. Treating them as generic build errors wastes cognitive bandwidth and slows CI pipelines. The industry needs a deterministic, compiler-adjacent triage layer that intercepts build output, classifies AI-specific hallucinations, and scopes them to active changes.
WOW Moment: Key Findings
The critical insight is that compiler stderr contains all the signal needed to classify AI hallucinations, but it's buried under noise. By intercepting and parsing this output before it reaches the developer, we can separate AI-induced build failures from human typos with near-zero latency.
| Approach | Latency | Cost | Determinism | AI Hallucination Coverage |
|---|---|---|---|---|
| AST Static Analyzer | 2β5s | Free | High | Low (misses missing APIs) |
| LLM Code Review | 10β30s | $0.01β$0.05/run | Low (stochastic) | Medium-High (noisy) |
| Compiler Stderr Classifier | 0.5β2s | Free | High | High (catches signature/API gaps) |
This finding matters because it flips the conventional AI-review pipeline. Instead of feeding raw diffs into a frontier model and hoping it catches compilation errors, we use the compiler itself as the first detector. The compiler already performs exhaustive type checking, symbol resolution, and signature validation. Replicating that logic in an AST walker or LLM prompt is redundant. A stderr classifier extracts the compiler's verdict, maps it to AI-specific failure buckets, and filters it through a version-control diff. The result is a pre-PR check that runs in under two seconds, costs nothing, and never hallucinates its own output.
Core Solution
The architecture relies on three principles: intercept compiler output, classify via deterministic patterns, and scope to active changes. We avoid AST traversal and LLM inference at the first line of defense.
Step 1: Capture Compiler Stderr
The build command must run in a subprocess, with standard error piped to a buffer. Standard output is ignored because build failures, warnings, and stack traces are emitted to stderr.
func runBuild(ctx context.Context, dir string) ([]byte, error) {
cmd := exec.CommandContext(ctx, "go", "build", "./...")
cmd.Dir = dir
var stderr bytes.Buffer
cmd.Stderr = &stderr
cmd.Stdout = io.Discard // Ignore stdout noise
err := cmd.Run()
// We intentionally ignore the exit code here.
// The classifier needs to parse stderr regardless of success/failure.
return stderr.Bytes(), nil
}
Step 2: Pattern Classification Engine
Instead of walking the AST, we map compiler error formats to structured failure types. Go's compiler output follows predictable patterns. We compile regex patterns once and reuse them across invocations.
type FailureCategory string
const (
UndefinedSymbol FailureCategory = "undefined-symbol"
UndefinedMethod FailureCategory = "undefined-method"
ArityMismatch FailureCategory = "arity-mismatch"
TypeMismatch FailureCategory = "type-mismatch"
)
type ErrorPattern struct {
Category FailureCategory
Regex *regexp.Regexp
}
var patterns = []ErrorPattern{
{
Category: UndefinedMethod,
Regex: regexp.MustCompile(`(?P<file>[^:]+):(?P<line>\d+):.*\bundefined:\s*(?P<method>\w+\.\w+)`),
},
{
Category: ArityMismatch,
Regex: regexp.MustCompile(`(?P<file>[^:]+):(?P<line>\d+):.*not enough arguments in call to\s*(?P<func>\w+)`),
},
{
Category: TypeMismatch,
Regex: regexp.MustCompile(`(?P<file>[^:]+):(?P<line>\d+):.*cannot use\s*(?P<arg>\w+)\s*\(type\s*(?P<type>\w+)\) as\s*(?P<expected>\w+)`),
},
}
func classifyErrors(raw []byte) map[FailureCategory][]string {
results := make(map[FailureCategory][]string)
lines := strings.Split(string(raw), "\n")
for _, line := range lines {
for _, p := range patterns {
if matches := p.Regex.FindStringSubmatch(line); matches != nil {
results[p.Category] = append(results[p.Category], line)
break // Match once per line
}
}
}
return results
}
Step 3: Diff-Scoped Filtering
Running this against an entire repository produces noise. The value emerges when we restrict analysis to files modified in the current branch or PR. We extract changed files via git diff --name-only, then filter the classified errors to only those paths.
func getChangedFiles(ctx context.Context, baseBranch string) ([]string, error) {
cmd := exec.CommandContext(ctx, "git", "diff", "--name-only", baseBranch+"...HEAD")
out, err := cmd.Output()
if err != nil {
return nil, err
}
var files []string
for _, f := range strings.Split(strings.TrimSpace(string(out)), "\n") {
if f != "" {
files = append(files, f)
}
}
return files, nil
}
func filterByScope(classified map[FailureCategory][]string, scope []string) map[FailureCategory][]string {
scopeSet := make(map[string]struct{})
for _, f := range scope {
scopeSet[f] = struct{}{}
}
filtered := make(map[FailureCategory][]string)
for cat, errors := range classified {
for _, errLine := range errors {
// Extract file path from compiler output (first token before colon)
parts := strings.SplitN(errLine, ":", 2)
if len(parts) < 2 {
continue
}
if _, ok := scopeSet[parts[0]]; ok {
filtered[cat] = append(filtered[cat], errLine)
}
}
}
return filtered
}
Architecture Rationale
- Why regex on stderr? The Go compiler already performs exhaustive type checking, symbol resolution, and signature validation. Re-implementing this in an AST walker duplicates compiler logic and introduces version drift. Regex on stderr is a thin, stable projection layer.
- Why diff scoping? AI agents generate code incrementally. Triage should focus on what changed, not what already exists. Diff scoping reduces CI runtime by 60β80% and eliminates false positives from legacy code.
- Why not LLMs first? LLM-based review is non-deterministic, costly, and slow. It's excellent for architectural feedback or pattern completeness (e.g., missing connection pings), but terrible for catching
undefined: db.WithTimeout. The inverse pipeline is correct: deterministic compiler checks β AST/static analysis β LLM review only on flagged regions.
Pitfall Guide
1. Capturing stdout instead of stderr
Explanation: Build systems emit compilation errors, warnings, and stack traces to standard error. Capturing stdout returns empty or build-success messages.
Fix: Explicitly bind cmd.Stderr to a buffer and discard cmd.Stdout. Verify by running go build ./... 2>&1 locally to confirm error routing.
2. Ignoring Go version drift in error formats
Explanation: Go 1.20, 1.21, and 1.22 changed compiler error phrasing and column reporting. Regex patterns that work on one version break on another.
Fix: Version-detect the toolchain at runtime. Maintain a compatibility matrix of regex patterns per Go minor version. Use go version to select the active pattern set.
3. Over-matching generic typos as AI hallucinations
Explanation: A human typing db.QuerRowContext triggers the same undefined error as an AI hallucination. The classifier cannot distinguish intent from output alone.
Fix: Tag errors as ai-suspect rather than ai-definitive. Combine with VCS metadata (e.g., Co-Authored-By trailers, Cursor/Devin commit signatures) to weight the classification. Treat all compiler failures as suspect until proven human.
4. Running full-repo scans in CI
Explanation: Scanning ./... on every PR causes timeout failures in large monorepos and floods logs with legacy errors.
Fix: Always scope to git diff --name-only origin/main...HEAD. Cache build artifacts using GOCACHE and GOMODCACHE to ensure the compiler only rechecks changed packages.
5. Missing context extraction for arity/type errors
Explanation: Reporting arity-mismatch without showing the expected signature leaves developers guessing. The compiler output contains the signature, but naive parsers drop it.
Fix: Extend regex capture groups to extract function names and expected signatures. Format output as:
internal/store/user.go:42: arity-mismatch
ctx.WithTimeout(5 * time.Second) called with 1 arg, expected 2
func WithTimeout(parent Context, timeout Duration) (Context, CancelFunc)
6. Assuming compiler output is line-buffered
Explanation: Some build wrappers or CI environments buffer stderr, causing delayed or interleaved output. Regex matching on incomplete lines fails.
Fix: Use bufio.Scanner with SplitFunc that handles partial lines, or run the compiler with GOTRACEBACK=crash and GOFLAGS=-v to force immediate flushing. Alternatively, wrap the build in a script that redirects to a temp file and reads it post-execution.
7. Replacing all review with automation
Explanation: The classifier catches signature/API gaps but misses pattern-incompleteness bugs (e.g., sql.Open without db.Ping, missing rows.Close, unhandled context cancellation). These compile and run but fail in production.
Fix: Treat the stderr classifier as a first-pass gate. Route flagged files to static analysis for resource leaks, then to LLM review for architectural pattern validation. Never rely on a single layer.
Production Bundle
Action Checklist
- Instrument CI pipeline to capture
go build ./...stderr into a temporary artifact - Implement version-aware regex matcher with Go 1.20+ compatibility matrix
- Add diff-scoping logic using
git diff --name-onlyagainst base branch - Format output to include expected signatures for arity/type mismatches
- Integrate with PR status checks to block merges on
undefined-methodorarity-mismatch - Tag errors with VCS metadata (
Co-Authored-By, tool signatures) for AI-suspect weighting - Route flagged files to secondary static analysis for resource leak detection
- Establish evaluation harness: precision/recall tracking against 50+ real AI-authored PRs
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Pre-commit hook | Stderr classifier + diff scope | Instant feedback, zero cost, catches hallucinations before push | Free |
| PR review gate | Stderr classifier β AST analyzer β LLM triage | Layers cheap checks first, reserves expensive review for flagged regions | Low ($0.01β$0.03/PR) |
| Legacy codebase audit | Full-repo AST scan + pattern-incompleteness detector | Stderr classifier only catches build failures; legacy code may compile but contain AI artifacts | Medium (compute time) |
| New AI-generated module | Stderr classifier + strict signature enforcement | AI agents frequently hallucinate SDK methods; early detection prevents technical debt | Free |
| Multi-language monorepo | Language-specific stderr parsers + unified diff scoping | Each compiler emits different error formats; unified diff scope keeps CI fast | Low (maintenance overhead) |
Configuration Template
# .github/workflows/ai-triage.yml
name: AI Build Triage
on:
pull_request:
branches: [main]
jobs:
triage:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: '1.22'
- name: Run AI Triage Classifier
run: |
go run ./cmd/triage \
--base-branch origin/${{ github.base_ref }} \
--format json \
--output triage-report.json
- name: Upload Triage Report
uses: actions/upload-artifact@v4
with:
name: ai-triage-report
path: triage-report.json
- name: Fail on AI Hallucinations
run: |
if jq -e '.undefined_method | length > 0 or .arity_mismatch | length > 0' triage-report.json > /dev/null; then
echo "::error::AI hallucination detected in build output"
exit 1
fi
Quick Start Guide
- Initialize the classifier binary: Create a Go module with the stderr capture, regex classification, and diff-scoping logic. Build it as a standalone CLI tool.
- Configure base branch detection: Pass
--base-branch origin/main(or equivalent) to the tool so it knows which commits to diff against. - Run locally: Execute
./triage --base-branch mainin your repository. Verify that only errors in changed files are reported, and that arity/type mismatches include expected signatures. - Integrate into CI: Add the workflow template to your repository. Set the gate to fail on
undefined-methodorarity-mismatchcategories. - Validate precision: Run the tool against 10 recent PRs. Compare flagged errors against actual merge history. Adjust regex patterns if false positives exceed 5%. Iterate until precision stabilizes above 90%.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
