5% (pre/post capture) | ~90% (isolated drill) | High (scoped tool calls) |
Key Findings:
- Diff-centric reporting reduces cognitive load by surfacing only deviations from baseline, eliminating noise from static state dashboards.
- Event-driven crash capture preserves pre-death and post-restart logs, solving the transient failure visibility gap.
- Isolated backup drills validate restore pathways without risking production workloads, increasing confidence from ~30% to ~90%.
- MCP/CLI scoping ensures AI agents operate within explicit, JSON-structured boundaries, preventing lateral movement or destructive commands.
Core Solution
HomeButler is engineered as a single Go binary with zero dependencies: no daemon, no database, no always-on web service. It implements a layered architecture that decouples the interface from the core logic, enabling identical functionality across CLI, MCP (stdio), and embedded web surfaces.
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 3 β Chat Interface β
β Telegram Β· Slack Β· Discord Β· Terminal Β· Browser β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌββββββββββββββββββββββββββββ
β Layer 2 β AI Agent β
β Claude Β· LangChain Β· n8n Β· OpenClaw β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β CLI exec or MCP (stdio)
ββββββββββββββββββββββββΌββββββββββββββββββββββββββββ
β Layer 1 β Tool (homebutler) β YOU ARE HERE β
β β
β CLI Β· MCP Β· Web β same internal/ core β
β system Β· docker Β· ports Β· backup Β· watch β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
Enter fullscreen mode Exit fullscreen mode
The core implements three production-grade operational primitives:
1. report β Baseline Diffing & Change Detection
Instead of rendering static metrics, report captures a system snapshot and diffs it against the previous run. It surfaces anomalies, stopped containers, and port changes, returning structured JSON for programmatic consumption.
π Homebutler Report β mac-mini
ββ Current Status ββ
CPU: 5.0% Β· Memory: 8.3/16.0 GB Β· Disk: 4%
Containers: 1 running, 1 stopped Β· Public ports: 5
ββ Needs Attention ββ
β οΈ 1 container(s) stopped
ββ Notable Changes ββ
No significant changes since last report.
ββ Suggested Actions ββ
β Address items in 'Needs attention' above.
Enter fullscreen mode Exit fullscreen mode
2. watch β Transient Failure Capture & Flapping Detection
The watch subsystem hooks into Docker event streams, systemd journalctl, and PM2 processes. It maps exit codes to failure modes (137 = OOM/SIGKILL, 139 = Segfault, 143 = SIGTERM) and correlates them with log pattern matching (panic:, Out of memory, FATAL). Flapping detection triggers acute (3+ restarts in 10m) or chronic (5+ in 24h) alerts.
[03:14:22] INCIDENT: nginx (incident nginx-20260410-031422-7a2124)
Crash: OOM β process killed by SIGKILL (oom, confidence: high)
β FLAPPING: acute (3 restarts in short window)
Enter fullscreen mode Exit fullscreen mode
homebutler watch add nginx # interactive: pick docker / systemd / pm2
homebutler watch start # foreground monitoring
homebutler watch history # list past incidents
homebutler watch show <incident-id> # full crash report
Enter fullscreen mode Exit fullscreen mode
3. backup drill β Isolated Restore Validation
The backup drill command extracts a backup archive, boots it in an isolated Docker network with a randomized ephemeral port, executes an HTTP health check against the restored service, and tears down the environment. This validates backup integrity and application compatibility without touching production.
π Backup Drill β uptime-kuma
π¦ Backup: backup_2026-04-04_1711.tar.gz (18.6 MB)
π Integrity: β
tar valid (8 files)
π Boot: β
container started in 0s
π Health: β
HTTP 200 on port 58574
β±οΈ Total: 2s
β
DRILL PASSED
Enter fullscreen mode Exit fullscreen mode
The architecture prioritizes cron-friendly execution, explicit exit codes, and native MCP server support, enabling safe AI agent integration without daemon overhead or external dependencies.
Pitfall Guide
- Unbounded Shell Access for AI Agents: Granting AI agents full SSH/root privileges creates catastrophic failure modes. Always restrict agents to narrow, JSON-returning tools with explicit allowlists to bound the blast radius.
- State-Only Monitoring vs. Change Detection: Dashboards that only display current metrics fail to highlight drift. Implement baseline diffing to surface only what changed, reducing operator cognitive load and alert fatigue.
- Ignoring Log Rotation in Crash Analysis: Containers and services often restart automatically, wiping transient logs. Capture pre-death and post-restart logs immediately upon exit code detection to preserve root causes.
- Assuming Backup Existence Equals Restore Capability: Backups can be corrupted, version-incompatible, or missing dependencies. Validate restore pathways regularly using isolated, ephemeral environments rather than relying on theoretical backup existence.
- Misconfiguring Flapping Detection Thresholds: Setting thresholds too low causes noise; too high misses acute failures. Tune acute (e.g., 3+ restarts in 10m) and chronic (e.g., 5+ in 24h) windows based on workload stability and restart policies.
- Overcomplicating the Agent Interface: AI agents perform best with structured, deterministic outputs. Avoid verbose CLI text or interactive prompts in automation pipelines; use
--json flags and explicit exit codes for reliable programmatic consumption.
- Running Always-On Daemons on Resource-Constrained Servers: Homelab environments often lack spare CPU/RAM. Prefer single-binary, event-driven, or cron-scheduled architectures over persistent daemons to minimize overhead and attack surface.
Deliverables
- Blueprint: Complete layered architecture diagram, MCP stdio integration guide, and Docker/systemd/PM2 event stream mapping. Includes network isolation topology for backup drills and AI agent tool scoping matrix.
- Checklist: Step-by-step deployment validation covering initial baseline creation,
watch rule configuration, flapping threshold tuning, cron scheduling for report --json, and periodic isolated backup drill execution.
- Configuration Templates: Ready-to-use
claude_desktop_config.json MCP snippet, systemd/cron job definitions for automated reporting, homebutler watch YAML/CLI presets, and Docker network isolation parameters for zero-risk backup validation.