Armorer Guard: a 0.0247 ms local Rust scanner for AI-agent prompt injection
Runtime Boundary Defense: Architecting Local AI Agent Security Scanners
Current Situation Analysis
The prevailing narrative around AI agent security focuses heavily on cinematic prompt injection: adversarial users typing clever jailbreaks into a chat interface. While model-level guardrails address this surface layer, they ignore the actual execution topology of modern agentic systems. Real-world security failures consistently emerge at runtime boundaries where untrusted data crosses into the model's context window or flows outward to external tools.
These boundaries include:
- Retrieved documents or web pages containing embedded instructions that override system prompts
- Tool execution results that attempt to exfiltrate internal state or credentials
- Coding agents that directly translate model output into shell commands without validation
- Browser automation agents following hidden DOM instructions or JavaScript payloads
- Support workflows that inadvertently write sensitive user data into memory, logs, or downstream APIs
This problem is systematically overlooked because traditional security testing operates on static code analysis or pre-deployment penetration testing. Agent runtimes, however, are dynamic. Data arrives asynchronously, tool outputs mutate state, and context windows accumulate over multiple turns. A security layer that only inspects the initial user prompt leaves the remaining execution path completely exposed.
The industry has historically relied on cloud-based guardrail APIs to fill this gap. While effective for compliance, these services introduce network latency, create data egress risks, and struggle to maintain throughput under high-concurrency agent workloads. The technical requirement is clear: a deterministic, zero-network, low-latency scanner that operates directly on the execution host, capable of inspecting prompts, tool arguments, memory writes, and outbound messages before they trigger downstream actions.
Benchmarks from production-grade local scanners demonstrate that this is no longer a theoretical constraint. Modern implementations leverage lightweight linear classifiers and optimized binary interfaces to achieve sub-millisecond inspection times while maintaining high detection accuracy. The feasibility of embedding security gates directly into agent hot paths has fundamentally shifted the architecture of runtime defense.
WOW Moment: Key Findings
The critical insight for engineering teams is that local-first scanning does not require sacrificing accuracy for speed. When comparing traditional cloud guardrail services against optimized local Rust-based scanners, the performance delta reveals why boundary inspection must move closer to the execution environment.
| Approach | Avg Latency | Network Dependency | Throughput Cost | Macro F1 Score |
|---|---|---|---|---|
| Cloud Guardrail API | 120β350 ms | High (TLS + DNS) | $0.002β$0.005 per request | 0.94β0.96 |
| Local Rust Scanner | 0.0247 ms | None | $0.00 (compute-bound) | 0.9833 |
The local scanner achieves a macro F1 score of 0.9833 and micro recall of 1.0000 across 1,411 validation rows, with an average classifier latency of 0.0247 ms. This performance profile enables continuous inspection at every data boundary without degrading agent response times or incurring per-request API costs.
Why this matters: Agent architectures are shifting toward multi-step reasoning loops where each step may involve retrieval, tool execution, or memory updates. If security inspection adds even 50 ms per step, a 10-step workflow introduces half a second of pure overhead. Local scanning eliminates this bottleneck, allowing security policies to be enforced deterministically at the process level. It also removes data egress concerns, ensuring that sensitive prompts, tool outputs, and internal state never leave the host environment.
Core Solution
Implementing a runtime boundary scanner requires a deliberate separation between the detection engine and the agent orchestration layer. The architecture prioritizes deterministic execution, language-agnostic integration, and structured output for policy enforcement.
Step 1: Deploy the Detection Binary
The core scanner is compiled to a native binary. This eliminates interpreter overhead, guarantees consistent memory usage, and removes dependency on heavy runtime environments. The binary exposes a simple standard input/output interface, making it callable from any host language.
Step 2: Build a Language Wrapper
Rather than duplicating detection logic across Python, Node, Go, or Rust agent frameworks, a thin wrapper handles process spawning, input serialization, and response parsing. The wrapper should never contain security rules; it only manages the boundary between the agent runtime and the scanner binary.
Step 3: Define Inspection Boundaries
Security gates must be placed at every data transition point:
- Inbound user prompts
- Retrieved context chunks
- Tool call arguments before execution
- Tool outputs before they enter the context window
- Memory writes and log entries
- Final model responses before delivery
Step 4: Handle Structured Responses
The scanner returns JSON containing redacted text, detection reasons, confidence scores, and policy flags. The wrapper must parse this response and apply the appropriate action: allow, redact, block, or escalate.
Implementation Example (TypeScript)
import { execFileSync } from 'child_process';
import { join } from 'path';
interface ScanResult {
sanitized_text: string;
suspicious: boolean;
reasons: string[];
confidence: number;
}
interface ScanConfig {
binaryPath: string;
timeoutMs: number;
minConfidence: number;
redactSecrets: boolean;
}
class BoundaryGuard {
private config: ScanConfig;
constructor(config: Partial<ScanConfig> = {}) {
this.config = {
binaryPath: join(__dirname, 'bin', 'guard_scanner'),
timeoutMs: 50,
minConfidence: 0.85,
redactSecre
ts: true, ...config, }; }
async inspect(payload: string): Promise<ScanResult> { const inputBuffer = Buffer.from(payload, 'utf-8');
try {
const output = execFileSync(this.config.binaryPath, {
input: inputBuffer,
timeout: this.config.timeoutMs,
encoding: 'utf-8',
maxBuffer: 1024 * 1024,
});
const result: ScanResult = JSON.parse(output);
if (result.suspicious && result.confidence >= this.config.minConfidence) {
return this.applyPolicy(result);
}
return result;
} catch (error) {
throw new Error(`BoundaryGuard inspection failed: ${(error as Error).message}`);
}
}
private applyPolicy(result: ScanResult): ScanResult { if (this.config.redactSecrets) { result.sanitized_text = result.sanitized_text.replace( /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z|a-z]{2,}\b/g, '[REDACTED_EMAIL]' ); } return result; } }
export { BoundaryGuard, ScanResult };
### Architecture Decisions & Rationale
**Rust Core Selection:** The detection engine is implemented in Rust to guarantee memory safety, predictable latency, and zero garbage collection pauses. Agent runtimes often operate under strict SLAs; a managed runtime with unpredictable pause times would introduce tail latency spikes during high-throughput inspection.
**TF-IDF Linear Classifier:** Instead of deploying a full neural network, the semantic detection lane uses a Rust-native TF-IDF linear classifier exported from pre-trained artifacts. This choice is deliberate:
- TF-IDF vectorization is deterministic and requires no GPU or heavy CPU inference
- Linear classification executes in microseconds with consistent memory footprint
- The model artifacts are lightweight (~15 MB), enabling fast cold starts
- Accuracy remains high (Macro F1: 0.9833) for the targeted detection classes
**JSON I/O Contract:** Standard input/output with JSON serialization ensures language agnosticism. The wrapper in TypeScript, Python, or Go only needs to handle process lifecycle and parsing. This prevents security logic duplication and centralizes rule updates to the binary.
**Confidence Thresholding:** The scanner outputs a continuous confidence score rather than a binary flag. This enables policy-driven decision making. High-risk boundaries (e.g., tool execution) can enforce stricter thresholds, while low-risk boundaries (e.g., logging) can tolerate higher false positive rates with redaction fallbacks.
## Pitfall Guide
### 1. Scanning Only Inbound Prompts
**Explanation:** Teams frequently place the scanner only at the chat interface, assuming that once a prompt passes inspection, all downstream data is safe. This ignores tool outputs, retrieved documents, and memory mutations that can contain injection payloads.
**Fix:** Implement boundary gates at every data transition: retrieval, tool arguments, tool outputs, memory writes, and final responses.
### 2. Hard-Blocking on Low Confidence Scores
**Explanation:** Treating any `suspicious: true` flag as a hard block causes legitimate agent workflows to fail. Semantic classifiers operate on probability distributions, and edge cases in technical documentation or internal APIs often trigger low-confidence flags.
**Fix:** Use confidence thresholds aligned with risk tolerance. Apply redaction or human-in-the-loop escalation for scores between 0.6β0.85, and reserve hard blocks for scores above 0.90.
### 3. Ignoring Redaction Capabilities
**Explanation:** Many teams focus solely on blocking malicious input but fail to sanitize outputs. Sensitive data can leak through logs, error messages, or tool responses even when the primary injection is blocked.
**Fix:** Enable automatic redaction for credential patterns, PII, and internal identifiers. Ensure the scanner's `sanitized_text` field is used for all downstream logging and context window insertion.
### 4. Treating the Scanner as a Model Replacement
**Explanation:** Boundary scanners detect patterns and semantic signals; they do not reason. Expecting them to understand complex multi-turn jailbreaks or novel adversarial strategies leads to false security assumptions.
**Fix:** Position the scanner as a runtime filter, not a reasoning engine. Combine it with model-level guardrails, structured tool schemas, and least-privilege execution policies.
### 5. Misaligning Timeout Budgets
**Explanation:** Agent workflows often have strict latency budgets. If the scanner timeout is set too high, it masks performance degradation. If set too low, it causes unnecessary process kills during high-load periods.
**Fix:** Set the wrapper timeout to 2β3x the average classifier latency (e.g., 50 ms for a 0.0247 ms classifier). Monitor p99 latency in production and adjust based on actual process scheduling overhead.
### 6. Overlooking Licensing Constraints
**Explanation:** Source-available local scanners often use non-commercial licenses for community distribution. Deploying them in production SaaS or enterprise environments without proper licensing creates compliance risk.
**Fix:** Audit license terms before production deployment. Secure commercial licenses where required, and maintain clear separation between development/testing and production environments.
### 7. Failing to Normalize Input Encoding
**Explanation:** Adversarial payloads frequently use Unicode normalization, zero-width characters, or mixed encoding to bypass pattern matching. Scanners that process raw bytes without normalization are vulnerable to evasion.
**Fix:** Implement input normalization before scanning: strip zero-width characters, normalize Unicode, decode HTML entities, and collapse whitespace. Apply normalization consistently across all boundary gates.
## Production Bundle
### Action Checklist
- [ ] Map all data boundaries in your agent architecture (input, retrieval, tools, memory, output)
- [ ] Deploy the scanner binary to the same host as the agent runtime
- [ ] Configure confidence thresholds based on boundary risk level (strict for tools, lenient for logs)
- [ ] Enable automatic redaction for credentials, PII, and internal identifiers
- [ ] Implement fallback policies for scanner timeouts or crashes (fail-open vs fail-closed)
- [ ] Monitor p99 latency and false positive rates in production dashboards
- [ ] Audit license compliance for commercial deployment environments
- [ ] Add normalization pre-processing to handle Unicode and encoding evasion techniques
### Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|----------|---------------------|-----|-------------|
| High-throughput agent pipeline (>100 req/s) | Local Rust scanner | Sub-millisecond latency, zero network overhead, deterministic throughput | $0 API cost, minimal compute |
| Compliance-heavy enterprise (HIPAA/SOC2) | Local scanner + audit logging | Keeps sensitive data on-prem, enables full audit trail without egress | Licensing cost, infrastructure overhead |
| Rapid prototyping / internal tools | Cloud guardrail API | Zero deployment, managed updates, simple integration | $0.002β$0.005 per request, scales linearly |
| Multi-tenant SaaS with strict data residency | Local scanner per tenant | Guarantees data never leaves tenant boundary, meets residency requirements | Higher infra cost, complex deployment |
### Configuration Template
```json
{
"scanner": {
"binary_path": "/opt/agents/bin/guard_scanner",
"timeout_ms": 50,
"max_buffer_bytes": 1048576
},
"policies": {
"min_confidence_block": 0.90,
"min_confidence_redact": 0.75,
"redact_secrets": true,
"redact_patterns": [
"email",
"api_key",
"internal_ip",
"jwt_token"
]
},
"boundaries": {
"inbound_prompt": { "enabled": true, "action": "block" },
"retrieved_context": { "enabled": true, "action": "redact" },
"tool_arguments": { "enabled": true, "action": "block" },
"tool_output": { "enabled": true, "action": "redact" },
"memory_write": { "enabled": true, "action": "log" },
"final_response": { "enabled": true, "action": "redact" }
},
"detection_classes": [
"prompt_injection",
"system_prompt_extraction",
"data_exfiltration",
"sensitive_data_request",
"safety_bypass",
"destructive_command",
"credential_leakage",
"risky_tool_call"
]
}
Quick Start Guide
- Download the scanner binary for your target architecture and place it in a dedicated directory (e.g.,
/opt/agents/bin/). - Verify execution by running a test payload through the binary using standard input. Confirm JSON output matches the expected schema.
- Initialize the wrapper in your agent codebase using the configuration template. Set boundary-specific actions and confidence thresholds.
- Integrate at one boundary first (e.g., tool arguments). Run a controlled workload and monitor latency, false positive rate, and redaction accuracy.
- Expand to remaining boundaries once the initial gate stabilizes. Update dashboards to track inspection throughput and policy enforcement actions.
