es with AI complexity.
Core Solution
Securing LLM applications requires a dedicated middleware layer that sits between the user interface, the model provider, and downstream execution systems. This layer enforces three principles: context isolation, capability restriction, and output verification. Below is a step-by-step implementation using TypeScript, followed by architectural rationale.
Start by declaring explicit boundaries for what the model can access and how outputs are handled.
interface ToolCapability {
name: string;
requiredPermissions: string[];
maxExecutionSteps: number;
irreversible: boolean;
}
interface SecurityPolicy {
allowedTools: ToolCapability[];
outputEncoding: 'html' | 'json' | 'plain';
maxContextTokens: number;
requireHumanApproval: boolean;
}
User input and external context must be normalized before reaching the model. This firewall strips instruction-like patterns, enforces token limits, and tags external content.
class SemanticFirewall {
private instructionPatterns = /(?:ignore|override|pretend|system|admin|root)\s*(?:instructions|rules|prompt|access)/gi;
sanitize(rawInput: string, policy: SecurityPolicy): string {
if (rawInput.length > policy.maxContextTokens * 4) {
throw new Error('Input exceeds context window');
}
const cleaned = rawInput.replace(this.instructionPatterns, '[REDACTED]');
return cleaned.trim();
}
tagExternalContext(content: string): string {
return `<external_data>${content}</external_data>`;
}
}
Agents and copilots should never execute tools directly. A router validates capabilities, enforces step limits, and logs execution intent.
class ToolPermissionRouter {
private executionLog: string[] = [];
async route(toolName: string, params: Record<string, unknown>, policy: SecurityPolicy): Promise<boolean> {
const capability = policy.allowedTools.find(t => t.name === toolName);
if (!capability) {
throw new Error(`Unauthorized tool: ${toolName}`);
}
if (this.executionLog.length >= capability.maxExecutionSteps) {
throw new Error('Step limit exceeded');
}
if (capability.irreversible && policy.requireHumanApproval) {
console.warn(`[SECURITY] Awaiting human approval for: ${toolName}`);
return false;
}
this.executionLog.push(toolName);
return true;
}
}
Step 4: Enforce Output Validation and Encoding
Model responses must be treated as untrusted data. Validate against schemas and encode before rendering or passing to downstream systems.
class OutputValidator {
validateSchema(response: unknown, schema: object): boolean {
// In production, use zod or ajv for runtime validation
return typeof response === 'object' && response !== null;
}
encodeForDelivery(rawOutput: string, encoding: 'html' | 'json' | 'plain'): string {
if (encoding === 'html') {
return rawOutput.replace(/[<>&"']/g, char => ({
'<': '<', '>': '>', '&': '&', '"': '"', "'": '''
}[char] || char));
}
return rawOutput;
}
}
Architecture Rationale
- Separation of Concerns: The firewall, router, and validator operate independently. This prevents a single bypass from compromising the entire pipeline.
- Least-Privilege Tool Execution: Tools are gated by explicit capabilities and step limits. Irreversible actions require human confirmation, mitigating excessive agency.
- Context Tagging: Wrapping external data in semantic tags prevents the model from conflating instructions with retrieved content, reducing indirect injection risk.
- Output Contract Enforcement: Schema validation and encoding ensure probabilistic outputs never become executable code or rendered HTML without sanitization.
Pitfall Guide
1. The System Prompt Fallacy
Explanation: Teams embed security rules, access controls, and operational boundaries directly into the system prompt, assuming the model will enforce them. LLMs do not execute policies; they follow statistical patterns. Adversarial prompts can easily override or extract these instructions.
Fix: Move all security enforcement to the application layer. Use the middleware firewall and router to validate inputs, restrict tools, and encode outputs. Treat system prompts as behavioral guidance, not security controls.
Explanation: Autonomous agents are allowed to execute multiple tool calls in sequence without oversight. A single ambiguous instruction can trigger a cascade of destructive actions, such as deleting records or modifying configurations.
Fix: Implement step counters and execution budgets. Require explicit human approval for irreversible operations. Log every tool invocation with its intent and parameters for auditability.
3. Embedding Space Blindness
Explanation: Vector stores are treated as passive data repositories. Attackers inject crafted documents that dominate similarity searches, hijacking retrieval context and poisoning model responses.
Fix: Namespace vector collections by tenant or access level. Gate ingestion pipelines with content validation and canary tracking. Monitor retrieval frequency anomalies and rotate embeddings periodically.
4. Output Trust Assumption
Explanation: Developers treat LLM responses as safe strings, rendering them directly in UIs or concatenating them into SQL queries and shell commands. Probabilistic generation can produce script tags, special characters, or code fragments.
Fix: Always encode outputs before delivery. Use parameterized queries for database interactions. Validate responses against strict schemas before passing them to execution layers.
5. Dependency Drift
Explanation: AI applications rely on base models, embedding providers, MCP servers, and vector databases. Updates to these components are deployed without verification, introducing hidden behaviors or supply chain compromises.
Fix: Maintain a Software Bill of Materials (SBOM) for all AI dependencies. Verify package checksums and cryptographic signatures. Stage updates in isolated environments and run regression tests against known security baselines.
6. Corpus Contamination
Explanation: Retrieval pipelines ingest external content automatically without review. Malicious or low-quality documents enter the knowledge base, altering model behavior and leaking sensitive information.
Fix: Implement human-in-the-loop review for high-impact data. Version retrieval corpora and maintain rollback snapshots. Use canary documents to detect unauthorized modifications or behavioral shifts.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Internal HR chatbot with restricted data | Namespace vector store + strict RBAC + output redaction | Prevents unauthorized retrieval and leakage of sensitive records | Low (infrastructure + policy config) |
| Customer-facing agent with tool execution | Step-limited router + human approval for irreversible actions + output encoding | Mitigates excessive agency and downstream injection | Medium (approval workflow + logging) |
| Public knowledge base with open ingestion | Content validation gate + canary tracking + corpus versioning | Blocks poisoning and enables rollback on contamination | Medium (ingestion pipeline + monitoring) |
| Multi-tenant SaaS with shared models | Tenant-scoped embeddings + isolated tool routing + strict schema validation | Ensures data isolation and prevents cross-tenant leakage | High (multi-tenancy architecture + audit) |
Configuration Template
// security.config.ts
import { SecurityPolicy, ToolCapability } from './types';
export const defaultPolicy: SecurityPolicy = {
allowedTools: [
{ name: 'fetch_document', requiredPermissions: ['read:docs'], maxExecutionSteps: 3, irreversible: false },
{ name: 'update_record', requiredPermissions: ['write:records'], maxExecutionSteps: 1, irreversible: true },
{ name: 'send_notification', requiredPermissions: ['notify:users'], maxExecutionSteps: 2, irreversible: false }
],
outputEncoding: 'html',
maxContextTokens: 4096,
requireHumanApproval: true
};
export const firewallConfig = {
stripInstructionPatterns: true,
tagExternalContext: true,
maxInputLength: 16384
};
export const outputValidation = {
strictSchema: true,
encodeHtmlEntities: true,
blockSqlOperators: true
};
Quick Start Guide
- Initialize the security layer: Install the middleware package and import
SemanticFirewall, ToolPermissionRouter, and OutputValidator into your application entry point.
- Define your policy: Copy the configuration template and adjust
allowedTools, maxContextTokens, and approval requirements to match your use case.
- Wrap your LLM calls: Route all user inputs through the firewall, pass sanitized context to the model, and validate responses through the output validator before delivery.
- Enable audit logging: Attach correlation IDs to every request, log tool executions with intent parameters, and forward events to your SIEM or observability platform.
- Validate with red-team prompts: Run a baseline test suite containing injection attempts, ambiguous instructions, and extraction queries. Verify that the firewall blocks manipulation, the router enforces limits, and outputs remain encoded.