Securing AI Coding Workflows: File Integrity Monitoring for CLI Agents

Current Situation Analysis

The rapid adoption of terminal-based AI coding agents has fundamentally changed how developers interact with codebases. Tools like Claude Code, Codex, and Copilot CLI now operate with broad filesystem access, executing multi-step refactors, dependency updates, and configuration changes autonomously. While this accelerates development velocity, it introduces a critical blind spot: unattended file mutation.

When a developer steps away from their machine, an AI agent can silently overwrite environment variables, modify CI/CD pipelines, or alter persistent instruction files. Traditional development workflows assume human oversight for every file change. AI agents break that assumption by operating at machine speed, often without explicit confirmation dialogs.

This problem is frequently overlooked because the industry has focused heavily on agent capabilities rather than safety boundaries. Most teams treat CLI agents as extensions of their own terminal session, assuming that if a command runs successfully, the output is correct. In reality, LLMs can misinterpret context, apply changes to the wrong directory, or chain operations that cascade into unintended modifications. File integrity monitoring exists in enterprise environments, but it lacks the contextual awareness needed for AI-driven workflows. Generic watchers like chokidar or fsnotify trigger on every write, creating noise without distinguishing between a developer's intentional edit and an agent's autonomous operation.

The gap becomes critical when considering persistent agent memory files. Modern CLI agents rely on markdown-based instruction files (e.g., CLAUDE.md, .cursorrules, .hermes/, Aider configs) to maintain context across sessions. If an agent accidentally modifies these files, it can poison its own future behavior, leading to compounding errors that are difficult to trace. Without a dedicated monitoring layer, developers are left reacting to broken builds or misconfigured environments long after the damage occurs.

WOW Moment: Key Findings

Transitioning from passive file watching to AI-agent-aware monitoring reveals a stark difference in operational safety. The table below compares traditional filesystem monitoring against an agent-governance approach:

Approach	Context Awareness	Rollback Capability	Memory File Protection	Notification Granularity	Agent Compatibility
Traditional File Watcher	None (blind to source)	Manual only	Not tracked	All events (high noise)	Universal but unfiltered
AI-Agent Monitor	Agent-aware routing	Automated/Approved	Dedicated tracking	Severity-based (HIGH/CRITICAL)	Optimized for CLI agents

This finding matters because it shifts file monitoring from a logging utility to an active governance layer. By classifying files by sensitivity and routing events through approval workflows, developers can maintain autonomy while establishing a safety net. The ability to approve or rollback changes via external channels (like Telegram) means protection extends beyond the terminal, covering periods when developers are away from their machines. This transforms reactive debugging into proactive risk mitigation.

Core Solution

Building an AI-agent file integrity monitor requires a daemon-based architecture that decouples file observation from event handling. The system must track writes, classify them by sensitivity, route them through notification channels, and provide controlled rollback mechanisms.

Architecture Decisions

File Watcher Over Process Interception: Attempting to intercept commands at the process level requires hooking into TUI frameworks or Rust binaries, which breaks frequently with agent updates. A filesystem watcher operates at the OS level, remains stable across agent versions, and captures the actual state change regardless of how it was triggered.
Severity-Based Event Routing: Not all file changes require intervention. Grouping files into LOW, MEDIUM, HIGH, and CRITICAL tiers allows the system to suppress noise while escalating sensitive modifications.
External Approval Channels: Relying solely on terminal prompts fails when developers are away from their desks. Integrating with messaging platforms enables asynchronous governance without blocking the agent's workflow.
Memory File Isolation: Agent instruction files require separate tracking because they influence future behavior. Modifying them without oversight can corrupt the agent's contextual baseline.

Implementation Walkthrough

1. Configuration Schema

The monitor relies on a structured configuration that defines watch paths, sensitivity tiers, and notification routing.

interface AgentGuardConfig {
  watchPaths: string[];
  sensitivityRules: SensitivityRule[];
  notifications: NotificationConfig;
  daemon: DaemonSettings;
}

interface SensitivityRule {
  pattern: string;
  severity: 'LOW' | 'MEDIUM' | 'HIGH' | 'CRITICAL';
  requiresApproval: boolean;
  rollbackEnabled: boolean;
}

interface NotificationConfig {
  telegram: {
    botToken: string;
    chatId: string;
    webhookPort: number;
  };
  macOS: {
    notifyHigh: boolean;
    notifyCritical: boolean;
  };
}

interface DaemonSettings {
  logLevel: 'info' | 'debug' | 'warn';
  reportInterval: number; // days
  maxEventHistory: number;
}

2. Telegram Webhook Handler

When a sensitive file changes, the daemon posts a message with inline buttons. The webhook processes approval or rollback requests.

import { Router } from 'express';
import { TelegramBotAPI } from './telegram-client';

const router = Router();
const bot = new TelegramBotAPI(process.env.TELEGRAM_BOT_TOKEN!);

router.post('/webhook/agent-guard', async (req, res) => {
  const { callback_query } = req.body;
  if (!callback_query) return res.status(200).send('OK');

  const { data, message } = callback_query;
  const [action, eventId] = data.split(':');

  try {
    if (action === 'approve') {
      await bot.answerCallbackQuery(callback_query.id, 'Change approved');
      await EventStore.markApproved(eventId);
    } else if (action === 'rollback') {
      await bot.answerCallbackQuery(callback_query.id, 'Rolling back...');
      await FileRollbackService.restore(eventId);
      await bot.editMessageText('✅ Rollback completed', {
        chat_id: message.chat.id,
        message_id: message.message_id,
      });
    }
  } catch (err) {
    console.error('Webhook processing failed:', err);
    await bot.answerCallbackQuery(callback_query.id, 'Action failed');
  }

  res.status(200).send('OK');
});

export default router;

3. Daemon Lifecycle Manager

The daemon orchestrates the watcher, event queue, and reporting subsystem.

import { FSWatcher } from 'chokidar';
import { EventEmitter } from 'events';

class AgentDaemon extends EventEmitter {
  private watcher: FSWatcher;
  private eventQueue: Map<string, FileEvent>;

  constructor(config: AgentGuardConfig) {
    super();
    this.watcher = new FSWatcher({
      ignored: /node_modules|\.git/,
      persistent: true,
      ignoreInitial: true,
    });
    this.eventQueue = new Map();
    this.loadConfig(config);
  }

  private loadConfig(config: AgentGuardConfig): void {
    config.watchPaths.forEach((dir) => this.watcher.add(dir));
    
    this.watcher.on('change', async (filePath, stats) => {
      const rule = this.matchSensitivityRule(filePath);
      if (!rule) return;

      const event: FileEvent = {
        id: crypto.randomUUID(),
        path: filePath,
        severity: rule.severity,
        timestamp: Date.now(),
        requiresApproval: rule.requiresApproval,
      };

      this.eventQueue.set(event.id, event);
      await this.routeEvent(event);
    });
  }

  private async routeEvent(event: FileEvent): Promise<void> {
    if (event.severity === 'HIGH' || event.severity === 'CRITICAL') {
      await this.sendMacOSNotification(event);
    }
    if (event.requiresApproval) {
      await this.sendTelegramApproval(event);
    }
  }

  public async generateReport(days: number): Promise<ReportData> {
    const cutoff = Date.now() - (days * 24 * 60 * 60 * 1000);
    const recentEvents = Array.from(this.eventQueue.values())
      .filter((e) => e.timestamp >= cutoff);
    
    return {
      period: `${days} days`,
      totalChanges: recentEvents.length,
      approved: recentEvents.filter((e) => e.status === 'approved').length,
      rolledBack: recentEvents.filter((e) => e.status === 'rolled_back').length,
      criticalEvents: recentEvents.filter((e) => e.severity === 'CRITICAL').length,
    };
  }
}

Rationale Behind Choices

Chokidar over native fs.watch: Provides cross-platform stability, ignores node_modules by default, and handles rapid successive writes without duplicate events.
UUID-based event tracking: Enables precise rollback targeting and prevents race conditions when multiple files change simultaneously.
Severity routing: Separates noise from actionable events. LOW/MEDIUM changes log silently, while HIGH/CRITICAL triggers immediate alerts.
External approval: Decouples governance from the terminal, ensuring protection continues during breaks or off-hours.

Pitfall Guide

1. Relying on Process Interception

Explanation: Attempting to hook into CLI agent processes (especially Rust binaries or TUI frameworks) to intercept commands before execution is fragile. Agent updates frequently change internal APIs, breaking hooks and leaving the system blind. Fix: Use filesystem-level monitoring. It captures the actual state change regardless of how the agent triggers it, and remains stable across version updates.

2. Ignoring Agent Memory Files

Explanation: Files like CLAUDE.md, .cursorrules, .hermes/, and Aider configs store persistent instructions. If an agent modifies these unintentionally, it corrupts its own context window for future sessions, leading to compounding errors. Fix: Explicitly track memory files in a dedicated sensitivity tier. Require approval for any modification and maintain versioned backups for quick restoration.

3. Webhook Timeout Misconfiguration

Explanation: Telegram's webhook API expects a response within 30 seconds. If the rollback or approval logic performs heavy I/O or waits for external services, the connection drops, causing duplicate messages or lost events. Fix: Acknowledge the webhook immediately with a 200 OK, then process the action asynchronously. Use a message queue or background worker for rollback operations.

4. Over-Filtering Critical Paths

Explanation: Aggressively ignoring directories to reduce noise can accidentally exclude sensitive configuration files. A misplaced glob pattern might skip .env.production or CI pipeline definitions. Fix: Use explicit allowlists for sensitive patterns rather than broad ignores. Validate watch rules against a test suite that includes edge-case file paths.

5. Race Conditions During Rollback

Explanation: If an agent continues writing while a rollback is in progress, the restored file can be immediately overwritten, creating a loop of conflicting states. Fix: Implement a temporary write lock on the target file during rollback. Pause the watcher for that specific path until the restoration completes, then resume monitoring.

6. Platform-Specific Watcher Limitations

Explanation: macOS uses FSEvents, Linux uses inotify, and Windows uses ReadDirectoryChangesW. Each has different limits on watch descriptors and event batching. Exceeding these limits causes silent failures. Fix: Monitor watcher health metrics. Alert when descriptor limits approach thresholds, and implement recursive directory splitting for large projects.

7. Silent Daemon Failures

Explanation: If the monitoring daemon crashes or loses network connectivity, developers remain unaware until a critical change goes unmonitored. Fix: Implement a watchdog process that verifies daemon heartbeat every 60 seconds. Use macOS launchd or systemd to auto-restart on failure, and route health checks to a dedicated alert channel.

Production Bundle

Action Checklist

Define sensitivity tiers: Map project files to LOW, MEDIUM, HIGH, and CRITICAL based on impact scope.
Configure memory file tracking: Add CLAUDE.md, .cursorrules, .hermes/, and Aider configs to the approval-required list.
Set up Telegram bot: Create a bot via BotFather, configure webhook URL, and store tokens securely in environment variables.
Implement write locking: Add temporary file locks during rollback to prevent agent overwrite conflicts.
Enable macOS notifications: Register for HIGH/CRITICAL alerts and verify notification center permissions.
Schedule daily reports: Configure agentguard daemon report --days=7 to run via cron or launchd.
Test rollback paths: Simulate sensitive file changes and verify approval/rollback flows before production deployment.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Solo developer, local only	File watcher + macOS notifications	Low overhead, immediate feedback, no external dependencies	Free
Team with CI/CD pipelines	File watcher + Telegram approval	Asynchronous governance, covers away-from-desk periods, audit trail	Telegram API free, minimal infra
Enterprise compliance	File watcher + Slack/Teams + SIEM integration	Centralized logging, role-based approval, audit retention	Higher infra cost, requires SSO/SCIM
High-frequency refactoring	File watcher + memory file isolation + write locks	Prevents context poisoning, handles rapid changes safely	Moderate dev time for lock management

Configuration Template

# agentguard.config.yaml
watch_paths:
  - ./src
  - ./config
  - ./scripts

sensitivity_rules:
  - pattern: "**/.env*"
    severity: CRITICAL
    requires_approval: true
    rollback_enabled: true
  - pattern: "**/.github/workflows/*.yml"
    severity: HIGH
    requires_approval: true
    rollback_enabled: true
  - pattern: "**/CLAUDE.md"
    severity: HIGH
    requires_approval: true
    rollback_enabled: true
  - pattern: "**/.cursorrules"
    severity: HIGH
    requires_approval: true
    rollback_enabled: true
  - pattern: "**/.hermes/**"
    severity: HIGH
    requires_approval: true
    rollback_enabled: true
  - pattern: "**/.aider*"
    severity: MEDIUM
    requires_approval: false
    rollback_enabled: true

notifications:
  telegram:
    bot_token: "${TELEGRAM_BOT_TOKEN}"
    chat_id: "${TELEGRAM_CHAT_ID}"
    webhook_port: 3001
  macos:
    notify_high: true
    notify_critical: true

daemon:
  log_level: info
  report_interval: 7
  max_event_history: 500

Quick Start Guide

Install the package globally:
```
npm install -g agentguard-dev
```
Initialize configuration:
```
agentguard init
```
This generates agentguard.config.yaml in your project root with default sensitivity rules.
Configure Telegram (optional): Create a bot via BotFather, set TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID in your environment, and update the config file.
Start the daemon:
```
agentguard daemon start
```
Verify operation with agentguard daemon status.
Launch the menu bar app (macOS):
```
cd $(npm root -g)/agentguard-dev/tray && npm install
agentguard tray
```
Click the shield icon to monitor daemon status, watched directories, and recent events. Use the popup to start/stop the watcher or generate a 7-day report.

AgentGuard 0.3.0 — macOS menu bar app, Telegram rollback, and more