e Overview
+------------------+ SSH (key-based) +-------------------+
| |------------------------->| VoIP Server 1 |
| VPS / Jump Box |------------------------->| VoIP Server 2 |
| (Claude Code) |------------------------->| VoIP Server 3 |
| |------------------------->| Replica DB |
| ~/.claude/ | +-------------------+
| skills/ |
| health/ | Docker (local) +-------------------+
| calls/ |------------------------->| Grafana |
| agents/ |------------------------->| Prometheus |
| ... |------------------------->| Loki |
| hooks/ |------------------------->| Homer (SIP/RTCP) |
| settings.json |------------------------->| Smokeping |
+------------------+ +-------------------+
|
| MCP (Model Context Protocol)
v
+------------------+
| Grafana MCP |
| (mcp-grafana) |
| - Dashboards |
| - PromQL queries |
| - Loki log search|
+------------------+
Every skill is a single Markdown file named SKILL.md inside its own directory under ~/.claude/skills/. The file has two parts: a YAML frontmatter header and a Markdown body.
Frontmatter (Required)
---
name: skill-name
description: One-line description shown in skill listings and used for matching.
user-invocable: true
allowed-tools: Bash(ssh *), Bash(docker *), Bash(curl *)
---
| Field | Purpose |
|---|
name | The slash command name. Users type /name to invoke. |
description | Shown in help listings. Also used by Claude to decide when to suggest the skill. Be specific -- mention the problem types this skill addresses. |
user-invocable | Set to true so users can trigger it directly with /name. |
allowed-tools | Whitelist of tools the skill can use. Uses glob patterns. Bash(ssh *) means "allow any Bash command starting with ssh". |
Allowed-Tools Patterns
# SSH to any server
allowed-tools: Bash(ssh *)
# SSH + Docker + curl + ping
allowed-tools: Bash(ssh *), Bash(docker *), Bash(curl *), Bash(ping *)
# SSH + local audio tools
allowed-tools: Bash(ssh *), Bash(curl *), Bash(sox *), Bash(soxi *), Bash(ffprobe *)
The tool patterns act as a security boundary. A skill that only needs SSH cannot accidentally execute Docker commands or write files. Design skills with the minimum tools they need.
Body (The Investigation Procedure)
The Markdown body is the actual instruction set. Claude reads this as its playbook when the skill is invoked. It should contain:
- What to do -- step-by-step procedures
- How to access resources -- SSH commands, SQL queries, API calls
- How to interpret results -- reference tables, thresholds, known patterns
- Server-specific variations -- different credentials, paths, or versions per server
- Output formatting -- how to present results to the user
The body supports a special variable: $ARGUMENTS -- whatever the user typed after the slash command. For example, if the user types /health server-a, then $ARGUMENTS is server-a.
Directory Structure
~/.claude/
settings.json # Global settings (permissions, hooks, env)
settings.local.json # Per-machine permission overrides
hooks/
protect-production.sh # Safety hook: blocks dangerous commands
skills/
health/
SKILL.md # /health skill
calls/
SKILL.md # /calls skill
agents/
SKILL.md # /agents skill
replication/
SKILL.md # /replication skill
audit-server/
SKILL.md # /audit-server skill
trunk-status/
SKILL.md # /trunk-status skill
audio-quality/
SKILL.md # /audio-quality skill
call-investigate/
SKILL.md # /call-investigate skill
call-drops/
SKILL.md # /call-drops skill
lagged/
SKILL.md # /lagged skill
network-check/
SKILL.md # /network-check skill
agent-ranks/
SKILL.md # /agent-ranks skill
did-lookup/
SKILL.md # /did-lookup skill
reports/
SKILL.md # /reports skill
listen-recording/
SKILL.md # /listen-recording skill
Skill Categories & Implementation
- Operations Skills (6): Answer "What is happening right now?" (
/health, /calls, /agents, /replication, /audit-server, /trunk-status). Use single-SSH aggregation patterns to minimize round-trips. Output structured tables with WARNING/CRITICAL flags.
- Investigation Skills (5): Answer "Why did this happen?" (
/audio-quality, /call-investigate, /call-drops, /lagged, /network-check). Chain cross-system queries: Homer RTCP β Asterisk logs β SIP peer stats β Smokeping β Codec verification.
- Lookup Skills (4): Answer "Where is this data?" (
/agent-ranks, /did-lookup, /reports, /listen-recording). Query ViciDial databases, DID routing tables, and recording storage paths with formatted output.
- MCP Grafana Integration: Connect Claude to
mcp-grafana for real-time PromQL queries, Loki log search, and dashboard rendering without leaving the CLI.
- Production Safety Hook:
hooks/protect-production.sh intercepts dangerous commands (rm -rf, DROP TABLE, iptables -F) and requires explicit confirmation or blocks execution entirely.
- Permission Management:
settings.json and settings.local.json enforce role-based tool access. Skills inherit permissions from the global config, ensuring least-privilege execution.
Pitfall Guide
- Over-Permissive Tool Whitelisting: Using
Bash(*) or broad patterns like Bash(ssh *) without scoping allows Claude to execute unintended commands. Always define the minimum required tools per skill (e.g., Bash(ssh voip-*), Bash(mysql -e *)).
- Hardcoding Server-Specific Credentials: Embedding passwords or IP addresses directly in
SKILL.md breaks portability and violates security practices. Use SSH key-based routing, environment variables, or dynamic credential lookup scripts.
- Ignoring Output Formatting Standards: Claude defaults to verbose, unstructured text. Explicitly define output templates in the skill body (e.g., "Present results as a Markdown table with columns: Server, Status, Metric, Threshold").
- Skipping Production Safety Hooks: Failing to implement
protect-production.sh risks accidental destructive commands during automated investigations. Always validate hooks in staging before deploying to production.
- Static Runbook Mentality: Treating skills as static text files instead of dynamic playbooks. Leverage
$ARGUMENTS for context-aware execution and conditional branching based on real-time server responses.
- Neglecting MCP Integration: Relying solely on SSH/CLI tools misses real-time metric correlation. Integrate
mcp-grafana for PromQL, Loki, and dashboard queries to close the monitoring-investigation loop.
- Inconsistent Skill Naming & Descriptions: Vague
name or description fields cause Claude to misroute invocations or fail to suggest relevant skills. Use precise, problem-oriented descriptions (e.g., "Diagnose SIP trunk registration failures and carrier connectivity").
Deliverables
- Architecture Blueprint: Complete directory tree (
~/.claude/skills/, hooks/, settings.json) with skill categorization, MCP integration points, and SSH/Docker routing paths.
- Skill Creation Checklist: Step-by-step validation for frontmatter accuracy, allowed-tools scoping,
$ARGUMENTS usage, output formatting rules, and safety hook verification before deployment.
- Configuration Templates: Ready-to-use
SKILL.md frontmatter blocks, settings.json permission overrides, protect-production.sh hook script, and mcp-grafana connection config for Prometheus/Loki/Grafana integration.