and the language model. The gateway performs three sequential operations: decoding normalization, entropy analysis, and semantic threat evaluation.
Architecture Decisions
- Decode Before Scan: Language models are deterministic decoders. If the model can parse Base64, Morse, Hex, or ROT13, an attacker can use it. The gateway must attempt decoding first, then scan the resulting plaintext. Scanning raw encoded strings is mathematically insufficient.
- Entropy Fallback: Attackers will invent custom encoding schemes. Shannon entropy calculation provides a statistical catch-all. Normal English prose averages 3.5β4.5 bits per character. Encoded or encrypted payloads consistently exceed 5.0 bits/char. A threshold at 5.0 provides safe headroom while flagging obfuscated content.
- Middleware Placement: The gateway must sit outside the model runtime. It intercepts inbound payloads, evaluates them, and returns a pass/block/quarantine decision. This ensures zero trust between ingestion and execution, regardless of the underlying LLM provider.
Implementation (TypeScript)
The following implementation demonstrates a production-ready sanitization pipeline. It replaces naive keyword matching with a multi-stage evaluation engine.
interface SanitizationResult {
status: 'pass' | 'quarantine' | 'block';
threatScore: number;
decodedPayloads: string[];
entropyBits: number;
matchedRules: string[];
}
interface EncodingProfile {
name: string;
pattern: RegExp;
decoder: (input: string) => string;
}
class AgenticInputSanitizer {
private readonly ENTROPY_THRESHOLD = 5.0;
private readonly THREAT_BLOCK_SCORE = 0.85;
private readonly THREAT_QUARANTINE_SCORE = 0.6;
private readonly ENCODING_PROFILES: EncodingProfile[] = [
{
name: 'morse',
pattern: /^[\.\-\/\s]+$/,
decoder: (input: string) => this.decodeMorse(input)
},
{
name: 'base64',
pattern: /^[A-Za-z0-9+/=]{20,}$/,
decoder: (input: string) => Buffer.from(input, 'base64').toString('utf8')
},
{
name: 'hex',
pattern: /^(?:[0-9a-fA-F]{2}\s?){10,}$/,
decoder: (input: string) =>
input.replace(/\s/g, '').match(/.{2}/g)!.map(b => String.fromCharCode(parseInt(b, 16))).join('')
}
];
public async evaluate(input: string): Promise<SanitizationResult> {
const decodedPayloads: string[] = [];
const matchedRules: string[] = [];
let threatScore = 0.0;
// Stage 1: Decode normalization
for (const profile of this.ENCODING_PROFILES) {
if (profile.pattern.test(input.trim())) {
try {
const decoded = profile.decoder(input);
if (decoded && decoded.length > 5) {
decodedPayloads.push(decoded);
}
} catch {
// Decoder failed, skip
}
}
}
// Stage 2: Entropy analysis
const entropy = this.calculateShannonEntropy(input);
if (entropy > this.ENTROPY_THRESHOLD) {
threatScore += 0.35;
matchedRules.push('high_entropy_obfuscation');
}
// Stage 3: Semantic threat scanning on decoded variants
const scanTargets = decodedPayloads.length > 0 ? decodedPayloads : [input];
for (const target of scanTargets) {
const semanticScore = this.evaluateSemanticIntent(target);
threatScore = Math.max(threatScore, semanticScore);
if (semanticScore >= this.THREAT_BLOCK_SCORE) {
matchedRules.push('command_injection_directive');
} else if (semanticScore >= this.THREAT_QUARANTINE_SCORE) {
matchedRules.push('suspicious_agentic_instruction');
}
}
// Stage 4: Decision routing
let status: 'pass' | 'quarantine' | 'block' = 'pass';
if (threatScore >= this.THREAT_BLOCK_SCORE) status = 'block';
else if (threatScore >= this.THREAT_QUARANTINE_SCORE) status = 'quarantine';
return {
status,
threatScore: Math.min(threatScore, 1.0),
decodedPayloads,
entropyBits: entropy,
matchedRules
};
}
private calculateShannonEntropy(text: string): number {
if (text.length < 20) return 0.0;
const freq: Record<string, number> = {};
for (const char of text) {
freq[char] = (freq[char] || 0) + 1;
}
const len = text.length;
return -Object.values(freq).reduce((sum, count) => {
const p = count / len;
return sum + p * Math.log2(p);
}, 0);
}
private evaluateSemanticIntent(text: string): number {
// Production systems should route this to a lightweight classifier or rule engine
// This placeholder demonstrates intent scoring logic
const commandPatterns = [
/(?:send|transfer|execute|approve|authorize)\s+\d+\s+(?:token|coin|native)/i,
/(?:ignore|override|bypass)\s+(?:previous|safety|policy)/i,
/(?:call|invoke|trigger)\s+(?:tool|function|contract)/i
];
let score = 0.0;
for (const pattern of commandPatterns) {
if (pattern.test(text)) score += 0.4;
}
// Penalize if decoded payload contains tool-use syntax
if (text.includes('@') || text.includes('function_call') || text.includes('tool_use')) {
score += 0.25;
}
return Math.min(score, 1.0);
}
}
Why This Architecture Works
The pipeline separates decoding from evaluation. By normalizing inputs first, the semantic scanner operates on actual intent rather than surface syntax. The entropy calculation acts as a statistical net for novel encodings, preventing attackers from bypassing detection by inventing custom formats. The decision routing (pass/quarantine/block) prevents hard failures that break user experience, while ensuring high-confidence threats never reach the model.
Middleware placement guarantees that the language model never processes untrusted content directly. Even if the LLM is capable of decoding Morse or Base64, the gateway intercepts the payload, evaluates it, and strips or quarantines it before context window injection. This breaks the agentic attack chain at its origin.
Pitfall Guide
1. Trusting Decoded Output Blindly
Explanation: Developers often decode payloads and immediately forward them to the LLM, assuming the decoding process neutralizes threats. Decoding merely reveals the payload; it does not sanitize it.
Fix: Always run decoded variants through a secondary semantic or rule-based scanner. Treat decoded text as untrusted input.
2. Hardcoding Encoding Lists Without Entropy Fallback
Explanation: Maintaining a whitelist of known encodings (Base64, Hex, ROT13) creates a false sense of security. Attackers routinely combine formats or invent lightweight obfuscation schemes.
Fix: Implement Shannon entropy calculation as a catch-all. Flag any input exceeding 5.0 bits/char for manual review or quarantine, regardless of format recognition.
Explanation: Agentic loops read their own tool results. If a tool returns attacker-controlled data (e.g., a scraped webpage, API response, or database query), the model will process it in the next turn.
Fix: Route all tool outputs through the same sanitization gateway before they re-enter the context window. Treat tool results as external input.
4. Over-Reliance on Regex for Intent Detection
Explanation: Regular expressions fail against semantic variation. An attacker can rephrase send 3B tokens as dispatch three billion units or use synonyms that bypass pattern matching.
Fix: Combine lightweight regex with a semantic scoring engine or small classifier model. Use intent categories rather than exact string matches.
5. Blocking Instead of Quarantining
Explanation: Hard-blocking all suspicious input breaks legitimate workflows. Users may paste encoded logs, configuration strings, or multilingual content that triggers false positives.
Fix: Implement tiered responses. pass allows normal flow. quarantine isolates the payload for human review or sandboxed processing. block drops high-confidence threats. Log all decisions for audit.
6. Skipping Rate Limiting on the Gateway
Explanation: The sanitization layer becomes a new attack vector. Attackers can flood it with high-entropy payloads to exhaust CPU or trigger false positives across the pipeline.
Fix: Apply strict rate limiting and payload size caps at the gateway. Cache evaluation results for identical hashes. Drop requests exceeding threshold limits before they consume resources.
7. Assuming Model Self-Correction Mitigates Injection
Explanation: Some teams rely on system prompts like Ignore all previous instructions or Never execute external commands. Modern LLMs can be contextually overwhelmed or jailbroken into ignoring these rules.
Fix: Never trust model-level instructions for security. Enforce constraints at the pipeline layer. The gateway must physically prevent malicious payloads from reaching the context window.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-volume social feed ingestion | Pre-LLM Sanitization Gateway | Catches obfuscated payloads before context pollution; low latency overhead | Moderate (infrastructure + compute) |
| Internal enterprise RAG pipeline | Model-native guardrails + Output filtering | Lower threat surface; internal data is partially trusted | Low (native SDK features) |
| Crypto/financial tool execution | Gateway + Explicit allowlist for tool calls | Zero tolerance for injection; prevents unauthorized transactions | High (strict validation + audit logging) |
| Multi-turn conversational agent | Gateway + Tool-output sanitization | Prevents context poisoning across turns; maintains conversation integrity | Moderate (stateful pipeline) |
Configuration Template
sanitization_gateway:
version: "2.1"
pipeline:
decode_normalization:
enabled: true
formats:
- morse
- base64
- hex
- rot13
- url_encoded
entropy_analysis:
enabled: true
threshold_bits_per_char: 5.0
min_length: 40
semantic_scoring:
enabled: true
engine: "lightweight_classifier"
command_directives:
- transfer
- execute
- override
- bypass
- authorize
decision_routing:
pass_threshold: 0.6
quarantine_threshold: 0.85
block_threshold: 0.95
rate_limiting:
requests_per_second: 150
max_payload_bytes: 65536
cache_ttl_seconds: 300
logging:
level: "info"
store_decoded_variants: true
retention_days: 90
Quick Start Guide
- Initialize the Gateway: Deploy the
AgenticInputSanitizer class as a standalone service or inline middleware. Configure the YAML template to match your threat tolerance.
- Hook Inbound Channels: Wrap all external data fetchers (Twitter API, email parsers, web scrapers) with a call to
evaluate(input). Route the response through the decision router.
- Sanitize Tool Outputs: Intercept all tool execution results before they re-enter the agent loop. Pass them through the same
evaluate() method to prevent context poisoning.
- Tune Thresholds: Run a 48-hour shadow mode. Monitor false positive rates and adjust
pass_threshold and entropy_threshold based on your data profile.
- Enable Audit Logging: Store threat scores, decoded payloads, and matched rules. Review quarantined items weekly to refine semantic scoring rules and reduce manual overhead.
Agentic security is no longer about model alignment. It is about pipeline architecture. By treating external input as untrusted code and enforcing deterministic sanitization before context injection, teams can neutralize encoding obfuscation, prevent tool chaining exploits, and maintain operational continuity without sacrificing capability.