Build Your Own CLI Agent: A Step-by-Step Guide
Architecting Event-Driven Terminal Agents with React and Mozaik
Current Situation Analysis
Building interactive AI agents in the terminal has historically been a friction-heavy process. Traditional CLI applications rely on synchronous, line-by-line input loops (readline, inquirer, or raw process.stdin). When you introduce a large language model, you immediately collide with asynchronous inference, stateful conversation memory, and external tool execution. Forcing a procedural CLI pattern onto an agentic workflow creates race conditions, unresponsive interfaces, and tightly coupled code where UI rendering logic bleeds into API orchestration.
This problem is frequently overlooked because developers treat terminal AI tools as simple wrappers around curl or SDK calls. They assume a linear flow: user types → API returns → print output. Modern agentic frameworks, however, operate on event-driven state machines. The model doesn't just reply; it reasons, requests tool execution, waits for results, and iterates until a final answer emerges. Managing this lifecycle manually requires building a custom pub-sub system, handling concurrent tool calls, and synchronizing terminal rendering with background inference.
The industry has converged on two complementary solutions to solve this split:
- Declarative Terminal UI: Libraries like
inkmap React's component model to ANSI terminal rendering, allowing developers to manage terminal state with familiar hooks and virtual DOM diffing instead of manual cursor control and escape sequences. - Orchestration Abstraction: Frameworks like
@mozaik-ai/coreprovide anAgenticEnvironmentthat acts as a shared event bus. It decouples the inference loop, tool execution, and context management from the display layer, enabling clean separation of concerns.
By treating the terminal as a reactive UI and the agent as an event-driven participant, you eliminate the coupling that causes most CLI AI projects to collapse under their own complexity.
WOW Moment: Key Findings
The architectural shift from monolithic CLI scripts to event-driven agent environments fundamentally changes how terminal applications scale. The table below contrasts a traditional procedural approach with the decoupled Mozaik + Ink architecture.
| Approach | UI Responsiveness | State Management Overhead | Tool Integration Complexity | Concurrency Support |
|---|---|---|---|---|
| Monolithic CLI Script | Low (blocks on API/tool calls) | High (manual state machines) | High (inline try/catch & formatting) | None (sequential only) |
| Decoupled Agent Architecture | High (non-blocking event loop) | Low (framework-managed context) | Low (declarative tool registry) | Native (parallel tool execution) |
Why this matters: The decoupled approach transforms the terminal from a static output stream into a reactive application. The UI never blocks on network I/O or shell execution. Tool calls are registered declaratively, and the orchestration layer handles context routing, retry logic, and state synchronization. This enables streaming responses, concurrent tool execution, and clean unit testing of the agent loop without mocking terminal rendering.
Core Solution
The implementation follows a strict separation of concerns: the UI layer handles rendering and input capture, the orchestration layer manages conversation state and inference, and the tool layer defines executable capabilities. We use TypeScript throughout to enforce type safety across the agent boundary.
Step 1: Session Composition & Environment Bootstrap
The entry point should never contain agent logic. Its sole responsibility is environment initialization, dependency injection, and handing control to the UI renderer. We create a session factory that wires the inference runner, function executor, and shared environment.
// src/session-factory.ts
import {
AgenticEnvironment,
Gpt54,
ModelContext,
OpenAIInferenceRunner,
DefaultFunctionCallRunner,
} from "@mozaik-ai/core";
import { shellToolRegistry } from "./tools/shell-tools.js";
import { ShellAgent } from "./agents/shell-agent.js";
import { DisplayObserver } from "./observers/display-observer.js";
export interface TerminalSession {
transmit: (input: string) => void;
dispose: () => void;
}
export interface DisplayCallbacks {
onAssistantChunk: (chunk: string) => void;
onToolInvocation: (toolName: string) => void;
}
export function initializeSession(callbacks: DisplayCallbacks): TerminalSession {
const toolExecutor = new DefaultFunctionCallRunner(shellToolRegistry);
const inferenceEngine = new OpenAIInferenceRunner();
const conversationMemory = ModelContext.create("terminal-agent-v1");
const targetModel = new Gpt54();
targetModel.setTools(shellToolRegistry);
const eventBus = new AgenticEnvironment();
const agent = new ShellAgent(inferenceEngine, toolExecutor, eventBus, conversationMemory, targetModel);
const displayBridge = new DisplayObserver(callbacks);
agent.join(eventBus);
displayBridge.join(eventBus);
eventBus.start();
return {
transmit: (input: string) => agent.handleUserInput(input),
dispose: () => eventBus.stop(),
};
}
Architecture Rationale: We isolate the AgenticEnvironment as a central message bus. This prevents the UI from directly calling inference methods, which would couple rendering to network latency. The session factory returns a clean interface (transmit/dispose) that the React layer consumes, making the agent lifecycle predictable and testable.
Step 2: The Agent Loop Implementation
The agent is responsible for conversation memory, inference routing, and tool execution coordination. It extends BaseAgentParticipant and implements a state machine that tracks pending tool calls. Crucially, it contains zero UI logic.
// src/agents/shell-agent.ts
import {
BaseAgentParticipant,
UserMessageItem,
FunctionCallItem,
AgenticEnvironment,
ModelContext,
GenerativeModel,
InferenceRunner,
FunctionCallRunner,
FunctionCallOutputItem,
DeveloperMessageItem,
InputStream,
} from "@mozaik-ai/core";
const emptyStream: InputStream = {
async *stream() { /* No external input stream */ },
};
export class ShellAgent extends BaseAgentParticipant {
private activeToolCalls = new Set<string>();
constructor(
inferenceRunner: InferenceRunner,
toolRunner: FunctionCallRunner,
private readonly environment: AgenticEnvironment,
private readonly memory: ModelContext,
private readonly model: GenerativeModel,
) {
super(emptyStream, inferenceRunner, toolRunner);
}
public handleUserInput(rawInput: string): void {
const systemPrompt = DeveloperMessageItem.create(
"You are a terminal assistant. Execute shell commands when necessary to fulfill user requests."
);
this.memory
.addContextItem(systemPrompt)
.addContextItem(UserMessageItem.create(rawInput));
this.triggerInference(this.environment, this.memory, this.model);
}
override onFunctionCall(call: FunctionCallItem): void {
this.activeToolCalls.add(call.callId);
this.memory.addContextItem(call);
this.executeFunctionCall(this.environment, call);
}
override onFunctionCallOutput(output: FunctionCallOutputItem): void {
this.memory.addContextItem(output);
this.activeToolCalls.delete(output.callId);
if (this.activeToolCalls.size === 0) {
this.triggerInference(this.environment, this.memory, this.model);
}
}
}
Architecture Rationale: The activeToolCalls set prevents premature inference triggers when multiple tools are requested simultaneously. The agent only resumes reasoning when all pending executions complete. This pattern eliminates race conditions common in naive agent implementations.
Step 3: Tool Definition & Execution
Tools are declarative contracts. Each tool specifies its schema, description, and execution handler. We wrap Node's child_process in a promise-based executor to maintain async compatibility.
// src/tools/shell-tools.ts
import { Tool } from "@mozaik-ai/core";
import { exec } from "node:child_process";
import { promisify } from "node:util";
const execAsync = promisify(exec);
export const shellToolRegistry: Tool[] = [
{
name: "execute_shell",
description: "Runs a shell command and returns stdout/stderr output.",
parameters: {
type: "object",
properties: {
command: { type: "string", description: "Shell command to execute" },
workingDir: { type: "string", description: "Target directory path" },
},
required: ["command", "workingDir"],
},
strict: true,
type: "function",
invoke: async (args: { command: string; workingDir: string }) => {
try {
const { stdout, stderr } = await execAsync(args.command, {
cwd: args.workingDir,
timeout: 15000,
});
return stderr ? `Error: ${stderr}\nOutput: ${stdout}` : stdout;
} catch (err) {
return `Execution failed: ${(err as Error).message}`;
}
},
},
];
Architecture Rationale: We enforce a 15-second timeout to prevent runaway processes from blocking the agent loop. Error output is prefixed explicitly so the model can distinguish between successful execution and failures. The strict: true flag ensures the model adheres to the JSON schema, reducing parsing errors.
Step 4: Display Bridge & React Rendering
The observer listens to the AgenticEnvironment and forwards events to React state. The UI layer remains purely presentational, consuming callbacks without managing agent lifecycle.
// src/observers/display-observer.ts
import {
BaseObserverParticipant,
FunctionCallItem,
ModelMessageItem,
Participant,
} from "@mozaik-ai/core";
interface DisplayListeners {
onAssistantChunk: (text: string) => void;
onToolInvocation: (name: string) => void;
}
export class DisplayObserver extends BaseObserverParticipant {
constructor(private readonly listeners: DisplayListeners) {
super();
}
override onFunctionCall(call: FunctionCallItem): void {
this.listeners.onToolInvocation(call.toJSON()?.name ?? "unknown_tool");
}
override onExternalFunctionCall(_source: Participant, call: FunctionCallItem): void {
this.listeners.onToolInvocation(call.toJSON()?.name ?? "unknown_tool");
}
override onExternalModelMessage(_source: Participant, message: ModelMessageItem): void {
const content = message.content?.text ?? "";
if (content.length > 0) {
this.listeners.onAssistantChunk(content);
}
}
}
// src/ui/terminal-app.tsx
import React, { useState, useMemo, useRef, useEffect } from "react";
import { render, Box, Text, useInput, useApp } from "ink";
import { initializeSession, TerminalSession } from "../session-factory.js";
interface ChatEntry {
id: string;
role: "user" | "assistant" | "system";
content: string;
timestamp: number;
}
export function TerminalApp() {
const [input, setInput] = useState("");
const [history, setHistory] = useState<ChatEntry[]>([]);
const [isProcessing, setIsProcessing] = useState(false);
const sessionRef = useRef<TerminalSession | null>(null);
const { exit } = useApp();
const session = useMemo(() => {
if (!sessionRef.current) {
sessionRef.current = initializeSession({
onAssistantChunk: (text) => {
setHistory((prev) => {
const last = prev[prev.length - 1];
if (last && last.role === "assistant") {
return [...prev.slice(0, -1), { ...last, content: last.content + text }];
}
return [...prev, { id: crypto.randomUUID(), role: "assistant", content: text, timestamp: Date.now() }];
});
setIsProcessing(false);
},
onToolInvocation: (name) => {
setHistory((prev) => [
...prev,
{ id: crypto.randomUUID(), role: "system", content: `⚙️ Executing: ${name}`, timestamp: Date.now() },
]);
},
});
}
return sessionRef.current;
}, []);
useInput((inputChar, key) => {
if (key.return && input.trim() && !isProcessing) {
const userMsg = input.trim();
setHistory((prev) => [
...prev,
{ id: crypto.randomUUID(), role: "user", content: userMsg, timestamp: Date.now() },
]);
setInput("");
setIsProcessing(true);
session.transmit(userMsg);
} else if (key.escape) {
exit();
} else {
setInput((prev) => prev + inputChar);
}
});
useEffect(() => {
return () => session.dispose();
}, [session]);
return (
<Box flexDirection="column" paddingX={1}>
{history.map((entry) => (
<Box key={entry.id} flexDirection="column" marginBottom={1}>
<Text bold color={entry.role === "user" ? "cyan" : entry.role === "system" ? "yellow" : "green"}>
{entry.role === "user" ? "You" : entry.role === "system" ? "System" : "Agent"}
</Text>
<Text wrap="wrap">{entry.content}</Text>
</Box>
))}
{isProcessing && <Text color="gray">Thinking...</Text>}
<Box marginTop={1}>
<Text color="blue">{"> "}</Text>
<Text>{input}</Text>
<Text color="gray">{" "}</Text>
</Box>
</Box>
);
}
Architecture Rationale: React's useMemo ensures the session is instantiated exactly once, preventing duplicate event bus subscriptions. The history array uses immutable updates to trigger re-renders efficiently. The observer pattern guarantees that UI state updates never block the agent loop, maintaining terminal responsiveness even during heavy tool execution.
Pitfall Guide
1. Blocking the Event Loop with Synchronous Shell Calls
Explanation: Using execSync or blocking I/O inside tool handlers freezes the Node.js event loop, preventing the agent from processing concurrent events or streaming responses.
Fix: Always use promisify(exec) or child_process.spawn with async/await. Implement timeouts to prevent runaway processes.
2. Tightly Coupling UI State to Agent Memory
Explanation: Storing conversation history in React state instead of the framework's ModelContext causes desynchronization. The agent loses context on re-renders, and tool results fail to propagate correctly.
Fix: Keep ModelContext as the single source of truth for conversation state. Use React state only for display formatting and input buffering.
3. Ignoring Tool Execution Failures & Timeouts
Explanation: Unhandled shell errors or infinite loops cause the agent to hang indefinitely. The model receives no output and may retry the same command repeatedly.
Fix: Wrap tool execution in try/catch blocks. Return structured error messages prefixed with Error:. Enforce execution timeouts (e.g., 10-15 seconds) and kill child processes on expiry.
4. Overloading Context Window with Raw Output
Explanation: Feeding massive command outputs (e.g., ls -la /, cat large_log.txt) directly into the context window consumes tokens rapidly and degrades model performance.
Fix: Implement output truncation in tool handlers. Return only the first/last N lines or use streaming chunking. Consider adding a read_file tool with line-range parameters instead of dumping entire files.
5. Race Conditions in Concurrent Tool Calls
Explanation: When the model requests multiple tools simultaneously, naive implementations trigger inference after the first tool completes, leaving subsequent tool results orphaned.
Fix: Maintain a Set or counter of pending tool calls. Only trigger the next inference cycle when the pending count reaches zero, as demonstrated in the ShellAgent implementation.
6. Hardcoding Environment Variables at Runtime
Explanation: Embedding API keys or configuration directly in source files breaks security audits and prevents environment-specific deployments.
Fix: Use dotenv with explicit path resolution. Validate required variables at startup and throw descriptive errors if missing. Never log sensitive values.
7. Neglecting Terminal Resize & Stream Cleanup
Explanation: Ink components can render incorrectly when the terminal is resized, and unclosed event listeners cause memory leaks during hot-reloads or session disposal.
Fix: Subscribe to process.stdout.on('resize') to recalculate layout constraints. Always call environment.stop() and clean up event listeners in React's useEffect cleanup function.
Production Bundle
Action Checklist
- Validate environment variables at startup with explicit error messaging
- Implement tool execution timeouts and process cleanup handlers
- Add output truncation logic to prevent context window overflow
- Use immutable state updates in React to prevent unnecessary re-renders
- Register all tools with
strict: trueto enforce schema compliance - Implement graceful session disposal on terminal exit or signal interruption
- Add input sanitization to prevent shell injection attacks
- Configure TypeScript strict mode and ESLint rules for async/await patterns
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Simple command wrapper | Raw child_process + readline |
Lower overhead, no framework dependency | $0 (no LLM costs) |
| Multi-step agentic workflow | Mozaik + Ink | Event-driven state machine, clean separation | Moderate (LLM API + dev time) |
| Complex tool orchestration | LangGraph / AutoGen | Advanced state graphs, multi-agent coordination | High (infrastructure + API costs) |
| Production CLI with auth | Mozaik + Ink + Key management | Secure credential handling, audit logging | Moderate + infrastructure |
Configuration Template
// package.json
{
"name": "terminal-agent-cli",
"type": "module",
"scripts": {
"build": "tsc",
"start": "node dist/cli.js",
"dev": "tsx watch src/cli.tsx"
},
"dependencies": {
"@mozaik-ai/core": "^2.4.0",
"ink": "^4.4.0",
"react": "^18.2.0",
"dotenv": "^16.3.1"
},
"devDependencies": {
"@types/react": "^18.2.0",
"tsx": "^4.7.0",
"typescript": "^5.3.0"
}
}
# .env
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxx
AGENT_MODEL_ID=gpt-5-4
LOG_LEVEL=info
TOOL_EXECUTION_TIMEOUT_MS=15000
// tsconfig.json
{
"compilerOptions": {
"target": "ES2022",
"module": "NodeNext",
"moduleResolution": "NodeNext",
"jsx": "react-jsx",
"strict": true,
"esModuleInterop": true,
"outDir": "dist",
"rootDir": "src",
"declaration": true,
"sourceMap": true
},
"include": ["src/**/*"]
}
Quick Start Guide
- Initialize Project: Run
npm init -yand install dependencies:npm i @mozaik-ai/core ink react dotenvand dev dependencies:npm i -D typescript tsx @types/react. - Configure Environment: Create a
.envfile with yourOPENAI_API_KEY. SetAGENT_MODEL_IDtogpt-5-4or your preferred OpenAI model. - Bootstrap Structure: Create
src/cli.tsx(entry),src/session-factory.ts(orchestration),src/agents/shell-agent.ts(loop),src/tools/shell-tools.ts(capabilities), andsrc/ui/terminal-app.tsx(rendering). - Run Development Server: Execute
npx tsx watch src/cli.tsx. The terminal will render the chat interface. Type a command likelist files in current directoryto trigger tool execution. - Production Build: Run
npm run buildto compile TypeScript. Start the compiled binary withnode dist/cli.js. Verify environment variables are loaded correctly before deployment.
Mid-Year Sale — Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register — Start Free Trial7-day free trial · Cancel anytime · 30-day money-back
