I built a self-hosted AI agent that runs as a system service and controls Android over ADB β here's the architecture
Architecting a Self-Hosted Cross-Platform AI Automation Engine
Current Situation Analysis
The modern AI stack has reached a critical inflection point. Large language models excel at reasoning, planning, and natural language generation, yet they remain fundamentally trapped in conversational interfaces. Developers and infrastructure teams are increasingly demanding autonomous execution capabilities: the ability to trigger system commands, manipulate mobile devices, interact with web applications, and orchestrate cross-platform workflows without manual intervention.
This gap is frequently misunderstood. Many teams assume that adding a few API wrappers to an LLM framework constitutes an "agent." In reality, production-grade automation requires a persistent execution loop, secure credential isolation, concurrent tool scheduling, and robust state management. The industry has largely overlooked the complexity of bridging probabilistic model outputs with deterministic system control. When an AI decides to tap a button on an Android device, parse a UI tree, or execute a shell command with environment state, the margin for error collapses. A single misrouted credential or unhandled PTY stream can compromise the entire host.
Data from infrastructure deployments shows that fragmented automation tools lead to credential sprawl, state desynchronization, and unbounded execution loops. Traditional RPA solutions lack semantic reasoning, while chat-only AI frameworks lack system-level access. The solution lies in a unified, self-hosted orchestration layer that treats every external capability as a typed tool, routes execution through a secure loop, and persists state in a concurrent-safe database. This architecture shifts AI from a passive responder to an active system operator, enabling reliable cross-device automation while maintaining strict security boundaries.
WOW Moment: Key Findings
The architectural shift from conversational AI to autonomous execution fundamentally changes how systems handle state, security, and tool coverage. The following comparison highlights the operational differences between traditional approaches and a unified agent loop:
| Approach | Tool Coverage | Execution Latency | Security Boundary | State Persistence |
|---|---|---|---|---|
| Chat-Only AI | API wrappers only | Low (stateless) | Model provider handles keys | None (session-based) |
| Traditional RPA | Fixed, platform-specific | High (rigid workflows) | Hardcoded credentials | File/DB dependent |
| Unified Agent Loop | Dynamic, cross-platform | Medium (tool routing) | Server-side vault + session tokens | SQLite WAL + run history |
This finding matters because it proves that autonomous automation is not a feature add-on, but a structural requirement. By decoupling tool execution from the model provider and centralizing credential management, teams can safely delegate high-risk operations (ADB control, shell execution, browser automation) without exposing secrets to the client or the AI provider. The unified loop enables deterministic fallbacks, step limits, and audit trails, transforming probabilistic outputs into reliable system actions.
Core Solution
Building a production-ready automation engine requires careful separation of concerns: tool registration, execution orchestration, credential routing, and device isolation. Below is a step-by-step implementation using TypeScript, followed by architectural rationale.
Step 1: Tool Registry & Interface Design
Every external capability must conform to a strict interface. This prevents the model from invoking untyped or unsafe operations.
interface ToolDefinition {
name: string;
description: string;
parameters: Record<string, { type: string; required: boolean }>;
execute: (params: Record<string, any>) => Promise<ToolResult>;
}
interface ToolResult {
success: boolean;
output: string | object;
metadata?: Record<string, any>;
}
class ToolRegistry {
private tools: Map<string, ToolDefinition> = new Map();
register(tool: ToolDefinition): void {
if (this.tools.has(tool.name)) {
throw new Error(`Tool ${tool.name} is already registered.`);
}
this.tools.set(tool.name, tool);
}
async invoke(name: string, params: Record<string, any>): Promise<ToolResult> {
const tool = this.tools.get(name);
if (!tool) throw new Error(`Unknown tool: ${name}`);
return await tool.execute(params);
}
getSchema(): ToolDefinition[] {
return Array.from(this.tools.values());
}
}
Rationale: Centralizing tool definitions allows the agent to dynamically generate function schemas for the AI provider. Type enforcement prevents malformed requests from reaching system boundaries.
Step 2: The Agent Execution Loop
The loop loads context, queries the model, executes returned tools, and feeds results back until completion.
class AgentOrchestrator {
private maxSteps: number = 15;
private registry: ToolRegistry;
private provider: AIProvider;
constructor(registry: ToolRegistry, provider: AIProvider) {
this.registry = registry;
this.provider = provider;
}
async run(userPrompt: string, context: AgentContext): Promise<RunResult> {
let stepCount = 0;
const history: Message[] = [{ role: 'user', content: userPrompt }];
while (stepCount < this.maxSteps) {
const response = await this.provider.chat(history, this.registry.getSchema());
if (!response.toolCalls || response.toolCalls.length === 0) {
return { finalOutput: response.content, steps: stepCount };
}
for (const call of response.toolCalls) {
const result = await this.registry.invoke(call.name, call.arguments);
history.push({ role: 'tool', toolCallId: call.id, content: JSON.stringify(result) });
}
stepCount++;
}
throw new Error('Step limit reached. Execution halted.');
}
}
Rationale: The step limit prevents infinite loops caused by model hallucination or tool misconfiguration. Feeding tool results back as structured messages maintains conversation continuity without leaking internal state to the client.
Step 3: Secure Credential & Session Routing
Secrets must never traverse the network to the frontend or AI provider. A server-side vault handles OAuth tokens and API keys.
class CredentialVault {
private store: Map<string, string> = new Map();
set(key: string, value: string): void {
this.store.set(key, value);
}
get(key: string): string | undefined {
return this.store.get(key);
}
async refreshOAuth(provider: string): Promise<string> {
const clientId = this.get(`${provider}_CLIENT_ID`);
const clientSecret = this.get(`${provider}_CLIENT_SECRET`);
const refreshToken = this.get(`${provider}_REFRESH_TOKEN`);
if (!clientId || !clientSecret || !refreshToken) {
throw new Error(`Missing OAuth credentials for ${provider}`);
}
const token = await this.requestToken(clientId, clientSecret, refreshToken);
this.set(`${provider}_ACCESS_TOKEN`, token);
return token;
}
}
Rationale: OAuth tokens are rotated server-side. The Flutter/web client authenticates via short-lived session tokens, ensuring that even if the frontend is compromised, AI provider keys and third-party secrets remain isolated.
Step 4: Android/ADB & Browser Isolation
Mobile and browser automation require strict host isolation. A QEMU-backed virtual machine runs the Android emulator and Chromium instance, preventing direct host access.
class AdbBridge {
private adbPath: string;
private deviceSerial: string;
constructor(adbPath: string, deviceSerial: string) {
this.adbPath = adbPath;
this.deviceSerial = deviceSerial;
}
async takeScreenshot(): Promise<Buffer> {
const raw = await this.execCommand('exec-out screencap -p');
return Buffer.from(raw, 'binary');
}
async dumpUiTree(): Promise<string> {
return await this.execCommand('exec-out uiautomator dump /dev/tty');
}
async tap(x: number, y: number): Promise<void> {
await this.execCommand(`shell input tap ${x} ${y}`);
}
private async execCommand(cmd: string): Promise<string> {
const fullCmd = `${this.adbPath} -s ${this.deviceSerial} ${cmd}`;
return await execAsync(fullCmd);
}
}
Rationale: UIAutomator XML dumps are heavy. Parsing them directly in the loop causes latency. Instead, the bridge extracts actionable nodes (clickable, typeable, scrollable) before returning data to the model. QEMU isolation ensures that browser automation or ADB commands cannot escape to the host filesystem.
Pitfall Guide
1. Blocking PTY Streams During Shell Execution
Explanation: Using child_process.exec or spawn without a pseudo-terminal loses environment state, breaks interactive commands, and causes deadlocks on long-running processes.
Fix: Implement a PTY-backed executor that streams stdout/stderr separately, enforces timeouts, and captures exit codes. Always pipe input through the PTY master to preserve shell context.
2. Raw UIAutomator XML Parsing Overhead
Explanation: Feeding full UI tree dumps to the model consumes excessive context tokens and increases latency. The model struggles to parse nested XML reliably. Fix: Pre-process the dump server-side. Extract only interactive nodes with coordinates, resource IDs, and text content. Return a flattened JSON structure that the model can reason over efficiently.
3. Credential Leakage to Client or AI Provider
Explanation: Storing API keys in frontend state or passing them as tool parameters exposes secrets to network logs and model training pipelines.
Fix: Maintain a server-side credential vault. Tools should reference secret aliases (e.g., messaging_platform: telegram) rather than raw tokens. The orchestrator resolves aliases internally before execution.
4. Unbounded Agent Loops
Explanation: Without a step limit, the model can enter recursive tool-calling cycles, exhausting API quotas and causing resource starvation. Fix: Enforce a hard step counter in the orchestrator. Log each iteration, and trigger a graceful fallback when the limit is reached. Implement exponential backoff for API retries to prevent cascade failures.
5. Emulator Cold-Start Latency
Explanation: QEMU-backed Android emulators require full VM initialization, base image downloads, and boot animations. First-run delays can exceed 90 seconds. Fix: Pre-warm the emulator during service startup. Use snapshot restoration instead of cold boots. Cache the base image locally and validate checksums before deployment.
6. Synchronous Tool Execution in Async Context
Explanation: Blocking the event loop during file I/O, network requests, or ADB commands stalls the entire agent loop, causing timeouts and dropped connections. Fix: Wrap all tool executions in non-blocking promises. Use worker threads for CPU-heavy operations like image processing or XML parsing. Maintain a task queue to serialize concurrent tool calls safely.
7. Missing OAuth Token Refresh Handling
Explanation: Third-party integrations (Google Workspace, Microsoft 365, Notion) expire access tokens. Failing to refresh them mid-execution breaks scheduled tasks and messaging workflows. Fix: Implement a token lifecycle manager that checks expiration timestamps before each call. Auto-refresh using stored refresh tokens, and queue failed executions for retry after successful rotation.
Production Bundle
Action Checklist
- Define strict tool interfaces with parameter validation and error boundaries
- Implement a PTY-backed shell executor with timeout and stream isolation
- Pre-process UIAutomator dumps into flattened JSON node structures
- Route all credentials through a server-side vault with session token auth
- Enforce a step limit in the agent loop with graceful fallback handling
- Pre-warm QEMU emulators and cache base images to reduce cold-start latency
- Implement OAuth token lifecycle management with auto-refresh and retry queues
- Enable SQLite WAL mode for concurrent read/write safety across run history
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Local development & testing | Ollama + SQLite + local ADB | Zero API costs, full control, fast iteration | Low infrastructure, high maintenance |
| Production cross-platform automation | Cloud AI provider + QEMU VM + OAuth vault | Scalable, secure, supports 15+ messaging platforms | Moderate API costs, higher VM overhead |
| High-frequency scheduled tasks | Cron triggers + Telnyx voice delivery + MCP client | Reliable execution, async result routing, extensible tooling | Low per-task cost, depends on voice API pricing |
| Strict compliance / air-gapped | Fully local stack + Ollama + file-based secrets | No external network calls, full data sovereignty | High hardware requirements, limited model capability |
Configuration Template
# agent.config.yaml
server:
port: 3333
session_ttl: 3600
max_loop_steps: 15
storage:
type: sqlite
path: ~/.agent/data/automation.db
wal_mode: true
credentials:
vault_path: ~/.agent/.env
oauth_providers:
- google_workspace
- microsoft_365
- notion
- home_assistant
ai_provider:
type: anthropic
model: claude-sonnet-4-20250514
fallback: openai/gpt-4o-mini
local_fallback: ollama/llama3.1
devices:
android:
adb_path: /usr/local/bin/adb
emulator_serial: emulator-5554
ui_dump_preprocess: true
browser:
isolation: qemu
chromium_path: /opt/chromium/chrome
extension_bridge: false
mcp:
enabled: true
remote_servers:
- url: https://mcp.internal.example.com/tools
auth: bearer_token
Quick Start Guide
- Initialize the service: Run
npm install -g agent-core && agent-core initto generate the directory structure and default configuration. - Configure credentials: Populate
~/.agent/.envwith AI provider keys and OAuth client secrets. Never commit this file to version control. - Register tools: Import the tool registry module and attach shell, file, messaging, and ADB bridges. Validate schemas before deployment.
- Start the orchestrator: Launch the server with
agent-core start. The agent loop will initialize, connect to the AI provider, and begin listening for client sessions. - Verify execution: Open the web client, authenticate via session token, and send a test prompt. Monitor the run history in SQLite to confirm tool routing and step limits are functioning correctly.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
