t configuration explicitly declares the execution backend, workspace context, and security boundaries. Unlike standard agents, it does not use a tools array. Instead, it relies on the CLI's native permission model.
import { Agent } from 'kaibanjs';
const infrastructureAuditor = new Agent({
type: 'ExternalCodingAgent',
name: 'Cloud Audit Specialist',
role: 'Infrastructure Cost & Compliance Analyzer',
goal: 'Execute CLI-based audits against cloud provider APIs and generate structured cost reports',
background: 'Operates in a Node.js environment with direct filesystem and network access. Uses Bash and cloud CLIs to fetch live resource metadata.',
codingBackend: 'claude-code', // Options: 'claude-code' | 'opencode' | 'mock'
workspaceRoot: process.env.AUDIT_WORKSPACE || '/opt/infra-audit',
timeoutMs: 300_000, // 5 minutes per task
claude: {
useBare: true, // Enables scripted JSON output mode
allowedTools: 'Bash,Read', // Narrow allowlist for security
permissionMode: undefined,
maxTurns: undefined,
maxBudgetUsd: undefined,
extraArgs: [],
},
});
Why these choices?
codingBackend: 'mock' is mandatory for CI pipelines. It bypasses subprocess spawning and returns deterministic stub data, allowing you to validate task chaining and HITL gates without live credentials.
useBare: true forces the CLI into a non-interactive, JSON-friendly mode. This is essential for programmatic parsing.
allowedTools restricts the subprocess to explicitly whitelisted capabilities. Starting narrow prevents accidental filesystem mutations or unrestricted network calls.
timeoutMs prevents zombie processes. Execution-heavy tasks often hang on network retries or interactive prompts; a hard timeout ensures the workflow can fail fast or retry.
Step 2: Wire Tasks with Context Interpolation
Task descriptions can reference outputs from previous steps using interpolation syntax. This keeps prompts focused and prevents monolithic context windows.
import { Task, Team } from 'kaibanjs';
const auditTask = new Task({
id: 'infrastructureAudit',
title: 'Run Cloud Resource Audit',
description: 'Analyze the current cloud environment for idle resources and pricing anomalies. Output a structured JSON report.',
expectedOutput: 'JSON object containing resource_id, status, estimated_monthly_cost, and compliance_flags',
agent: infrastructureAuditor,
});
const reviewTask = new Task({
id: 'costReview',
title: 'Generate Executive Summary',
description: 'Translate the audit findings into a stakeholder-ready report. Focus on cost-saving opportunities and risk factors. Reference: {taskResult:infrastructureAudit}',
expectedOutput: 'Markdown summary with actionable recommendations',
agent: standardReviewAgent, // Standard LLM agent
externalValidationRequired: true, // Pauses workflow for human approval
});
Architecture Rationale:
- Interpolation (
{taskResult:infrastructureAudit}) ensures the review agent receives only the relevant structured output, not the entire execution log.
externalValidationRequired: true enforces a HITL gate. The team state freezes until an external signal (e.g., API call from a frontend dashboard) resumes execution. This prevents irreversible actions from being triggered by AI confidence alone.
Step 3: Deploy in a Server-Side Runtime
ExternalCodingAgent requires Node.js. Browser environments cannot spawn subprocesses or manage CLI permissions. The execution must live behind an API boundary.
// Next.js API Route Example
import { Team } from 'kaibanjs';
import { infrastructureAuditor, standardReviewAgent } from './agents';
export async function POST(req: Request) {
const team = new Team({
name: 'Cloud Optimization Squad',
agents: [infrastructureAuditor, standardReviewAgent],
tasks: [auditTask, reviewTask],
inputs: await req.json(),
env: { ANTHROPIC_API_KEY: process.env.ANTHROPIC_API_KEY },
});
const result = await team.start();
return Response.json(result);
}
This pattern keeps API keys off the client, respects Node.js subprocess requirements, and enables streaming task updates via Server-Sent Events (SSE) for real-time dashboard feedback.
Pitfall Guide
Explanation: Task descriptions are interpolated into CLI prompts. If user-controlled strings contain shell metacharacters or CLI flags, they can alter execution behavior or escape the allowlist.
Fix: Sanitize all external inputs before interpolation. Never pass raw user strings to extraArgs or CLI flags. Use strict input validation schemas.
Explanation: Defaulting to broad permissions (e.g., allowing Write or unrestricted Bash) defeats the security boundary. The CLI can modify workspace files or execute arbitrary commands.
Fix: Start with Read and Bash only. Explicitly whitelist required commands. Audit stdout logs to verify tool usage matches expectations.
3. Ignoring Subprocess Timeouts
Explanation: Execution-heavy tasks often hang on network retries, interactive prompts, or large file parsing. Without a timeout, the workflow blocks indefinitely.
Fix: Set timeoutMs based on empirical task duration. Implement retry logic at the team level, not inside the CLI prompt. Monitor stderr for timeout signals.
4. Assuming Structured Output Always Parses
Explanation: The framework checks for a structured_output field in the CLI response. If the CLI returns plain text or malformed JSON, the task result falls back to raw string data.
Fix: Validate the output format in the expectedOutput contract. Implement a fallback parser in the consuming agent. Use mock backend to test parsing paths.
5. Running in Browser or Edge Runtimes
Explanation: ExternalCodingAgent spawns OS processes. Edge runtimes and browsers lack subprocess APIs and filesystem access.
Fix: Keep execution strictly server-side. Use API routes, serverless functions with Node.js runtimes, or containerized workers. Never instantiate the agent in client bundles.
6. Memory Exhaustion from Large Stdout
Explanation: The framework captures full stdout/stderr. Long-running tasks with verbose logging can consume gigabytes of memory, crashing the host process.
Fix: Limit task scope to essential operations. Configure the CLI to suppress debug logs. Stream output to disk instead of holding it in memory. Implement log rotation for CI pipelines.
7. Skipping Mock Backend Validation
Explanation: Deploying directly to claude-code or opencode without testing wiring leads to hidden failures in task chaining, HITL gates, and error handling.
Fix: Always run the team with codingBackend: 'mock' first. Verify interpolation, state transitions, and validation gates. Swap to the real backend only after deterministic tests pass.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Simple reasoning or text generation | Standard LLM Agent | Lower latency, no subprocess overhead | Low |
| Framework-managed tool calls (APIs, DB queries) | Tool-Calling Agent | Built-in retry, schema validation, shared context | Medium |
| CLI pipelines, filesystem ops, complex scraping | ExternalCodingAgent | Process isolation, explicit allowlists, deterministic CI testing | Medium-High (compute + CLI tokens) |
| CI/CD pipeline validation | ExternalCodingAgent with mock backend | Zero API cost, deterministic state transitions, fast feedback | Near-zero |
Configuration Template
import { Agent, Task, Team } from 'kaibanjs';
// Production-ready ExternalCodingAgent configuration
const executionAgent = new Agent({
type: 'ExternalCodingAgent',
name: 'Pipeline Executor',
role: 'CLI Workflow Specialist',
goal: 'Execute environment-specific tasks with strict security boundaries',
background: 'Runs in isolated Node.js subprocess. Respects allowlists and timeout constraints.',
codingBackend: process.env.NODE_ENV === 'production' ? 'claude-code' : 'mock',
workspaceRoot: process.env.WORKSPACE_PATH || '/tmp/agent-workspace',
timeoutMs: 180_000,
claude: {
useBare: true,
allowedTools: 'Bash,Read',
permissionMode: undefined,
maxTurns: undefined,
maxBudgetUsd: undefined,
extraArgs: [],
},
});
// Task with interpolation and HITL gate
const executionTask = new Task({
id: 'pipelineExecution',
title: 'Run Environment Audit',
description: 'Execute the audit script and return structured results. Context: {taskResult:previousStep}',
expectedOutput: 'JSON payload with status, metrics, and recommendations',
agent: executionAgent,
});
const team = new Team({
name: 'Execution Squad',
agents: [executionAgent],
tasks: [executionTask],
inputs: {},
env: { ANTHROPIC_API_KEY: process.env.ANTHROPIC_API_KEY },
});
Quick Start Guide
- Initialize with Mock Backend: Set
codingBackend: 'mock' and run the team locally. Verify task interpolation, state transitions, and HITL gates without consuming API credits.
- Configure Security Boundaries: Define
allowedTools with the minimum required capabilities. Set timeoutMs to prevent runaway processes. Ensure workspaceRoot points to an isolated directory.
- Swap to Production Backend: Replace
'mock' with 'claude-code' or 'opencode'. Inject required environment variables (e.g., ANTHROPIC_API_KEY). Run a single task to validate subprocess spawning and structured output parsing.
- Deploy Behind API Boundary: Wrap the team initialization in a server-side route. Stream task updates via SSE for real-time monitoring. Implement error handling for subprocess timeouts and allowlist violations.
- Validate in CI: Add a pipeline step that runs the team with
codingBackend: 'mock'. Assert task chaining, validation gates, and output schemas. Block merges if deterministic tests fail.