OWASP LLM Top 10 Explained: The Security Risks Every AI Developer Needs to Know

By Codcompass Team·2026-06-02·10 min read

Architecting Resilient LLM Applications: A Practical Guide to the OWASP LLM Top 10

Current Situation Analysis

Traditional application security operates on a deterministic premise: inputs are validated against fixed schemas, outputs are encoded against known sinks, and execution paths are explicitly defined. Large language models shatter this paradigm. When you integrate an LLM into your stack, you are no longer routing static data through predictable functions. You are routing probabilistic text through dynamic prompt boundaries, tool chains, and context windows that can be manipulated, poisoned, or exhausted.

This shift is frequently misunderstood. Engineering teams routinely apply legacy web security controls to LLM integrations, assuming that standard input sanitization and output encoding will suffice. The reality is that LLMs introduce entirely new attack surfaces that traditional frameworks do not cover. Prompt injection bypasses regex filters by exploiting semantic understanding rather than syntax. Tool misuse occurs when the model is granted excessive permissions without strict schema validation. Training data poisoning corrupts model behavior at the source, often remaining dormant until specific trigger conditions are met in production.

Regulatory bodies have already recognized this gap. The EU AI Act Article 15 explicitly mandates that AI systems must be resilient against adversarial attacks and maintain availability under stress. Article 14 requires human oversight for high-risk systems, directly targeting autonomous agent behavior. GDPR Article 32 enforces strict confidentiality controls, making sensitive information leakage from context windows a compliance violation. Industry incident reports consistently show that prompt hijacking, insecure tool execution, and output chaining account for the majority of production LLM security breaches. Treating LLM security as an afterthought is no longer an engineering preference; it is a regulatory and operational liability.

WOW Moment: Key Findings

The fundamental difference between traditional web security and LLM-native security is not just about new vulnerabilities; it's about a structural shift in how trust is established and enforced. The table below contrasts how security boundaries operate across both paradigms.

Dimension	Traditional Web Architecture	LLM-Native Architecture
Attack Surface	Fixed endpoints, static routes, known sinks	Dynamic prompt boundaries, tool chains, context windows, training pipelines
Primary Mitigation	Input validation, output encoding, RBAC, WAF	Prompt allowlisting, output sanitization, human-in-the-loop gates, tool schema enforcement
Compliance Mapping	OWASP Top 10, PCI-DSS, HIPAA, ISO 27001	EU AI Act Art. 14/15, GDPR Art. 32, NIST AI RMF, ISO 42001
Failure Mode	Code execution, data exfiltration, privilege escalation	Prompt hijacking, tool abuse, model poisoning, hallucination-driven decisions, compute exhaustion

This comparison reveals why legacy security controls fail against LLM workloads. Traditional systems assume the application logic dictates execution. LLM systems assume the model interprets intent, which means the model itself becomes a potential attack vector. Recognizing this distinction enables teams to design defense-in-depth architectures that validate prompt boundaries, enforce least-privilege tool execution, and maintain human oversight for state-changing operations. It transforms security from a reactive patch into a structural requirement.

Core Solution

Building a secure LLM application requires a dedicated orchestration layer that sits between user input, the model, and downstream systems. This layer must enforce strict boundaries at every transition point: input validation, prompt construction, inference routing, output sanitization, tool execution, and human approval. Below is a production-ready TypeScript implementation that demonstrates these controls in action.

Architecture Decisions & Rationale

Input Allowlisting Over Regex Filtering: Regex patterns are easily bypassed by semantic variations. An allowlist approach restricts input to known-safe patterns or structured data, drastically reducing the prompt injection surface.
Prompt Boundary Enforcement: User input must never be concatenated directly into system instructions. Instead, it should be injected into explicitly defined placeholders with strict type constraints. 3

. Tool Registry with Schema Validation: LLMs should not freely invoke arbitrary functions. A centralized tool registry enforces least-privilege permissions and validates all parameters against JSON schemas before execution. 4. Output Sanitization & Context Isolation: Model responses must be treated as untrusted data. Sanitization strips executable content, and context isolation prevents leakage between user sessions. 5. Human-in-the-Loop Gates: Any action that modifies state, accesses sensitive data, or triggers external systems requires explicit human approval, satisfying EU AI Act Article 14 requirements.

Implementation

import { z } from 'zod';

// 1. Strict input validation schema
const UserQuerySchema = z.object({
  sessionId: z.string().uuid(),
  query: z.string().min(1).max(500).regex(/^[a-zA-Z0-9\s.,!?-]+$/),
  metadata: z.record(z.string()).optional(),
});

// 2. Tool permission registry with schema enforcement
interface ToolDefinition {
  name: string;
  description: string;
  parameters: z.ZodTypeAny;
  maxExecutionTime: number;
  requiresApproval: boolean;
}

const ALLOWED_TOOLS: Record<string, ToolDefinition> = {
  search_database: {
    name: 'search_database',
    description: 'Query internal knowledge base',
    parameters: z.object({ keyword: z.string(), limit: z.number().max(10) }),
    maxExecutionTime: 3000,
    requiresApproval: false,
  },
  update_record: {
    name: 'update_record',
    description: 'Modify system configuration',
    parameters: z.object({ recordId: z.string(), payload: z.record(z.any()) }),
    maxExecutionTime: 5000,
    requiresApproval: true,
  },
};

// 3. Secure orchestration layer
class SecureLLMOrchestrator {
  private readonly rateLimiter: Map<string, number[]> = new Map();
  private readonly contextCache: Map<string, string[]> = new Map();

  async processRequest(rawInput: unknown): Promise<{ response: string; auditTrail: string[] }> {
    const auditTrail: string[] = [];

    // Step 1: Validate & sanitize input
    const validationResult = UserQuerySchema.safeParse(rawInput);
    if (!validationResult.success) {
      throw new Error('Invalid input format');
    }
    auditTrail.push('Input validation passed');

    // Step 2: Rate limiting & DoS protection
    this.enforceRateLimit(validationResult.data.sessionId);
    auditTrail.push('Rate limit check passed');

    // Step 3: Construct prompt with strict boundaries
    const systemPrompt = `You are a technical assistant. Answer using only provided context. Never execute commands.`;
    const userContext = this.contextCache.get(validationResult.data.sessionId) || [];
    const finalPrompt = `${systemPrompt}\n\nContext:\n${userContext.join('\n')}\n\nUser Query: ${validationResult.data.query}`;
    auditTrail.push('Prompt boundary enforced');

    // Step 4: Inference & output sanitization
    const rawModelOutput = await this.invokeModel(finalPrompt);
    const sanitizedOutput = this.sanitizeOutput(rawModelOutput);
    auditTrail.push('Output sanitized');

    // Step 5: Tool execution with approval gate
    if (this.requiresToolExecution(sanitizedOutput)) {
      const toolCall = this.extractToolCall(sanitizedOutput);
      await this.executeTool(toolCall, validationResult.data.sessionId);
      auditTrail.push(`Tool executed: ${toolCall.name}`);
    }

    return { response: sanitizedOutput, auditTrail };
  }

  private enforceRateLimit(sessionId: string): void {
    const now = Date.now();
    const windowMs = 60000;
    const maxRequests = 30;

    if (!this.rateLimiter.has(sessionId)) {
      this.rateLimiter.set(sessionId, []);
    }

    const timestamps = this.rateLimiter.get(sessionId)!;
    const recent = timestamps.filter(t => now - t < windowMs);
    if (recent.length >= maxRequests) {
      throw new Error('Rate limit exceeded');
    }
    recent.push(now);
    this.rateLimiter.set(sessionId, recent);
  }

  private sanitizeOutput(raw: string): string {
    // Strip executable patterns, HTML/JS, and prompt leakage attempts
    return raw
      .replace(/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gi, '')
      .replace(/eval\(|exec\(|system\(|subprocess/gi, '[BLOCKED]')
      .replace(/ignore\s+above\s+instructions/gi, '[FILTERED]');
  }

  private async executeTool(call: { name: string; args: Record<string, any> }, sessionId: string): Promise<void> {
    const tool = ALLOWED_TOOLS[call.name];
    if (!tool) throw new Error('Unauthorized tool invocation');

    // Schema validation
    tool.parameters.parse(call.args);

    // Human approval gate for state-changing operations
    if (tool.requiresApproval) {
      const approved = await this.requestHumanApproval(call, sessionId);
      if (!approved) throw new Error('Execution denied by human oversight');
    }

    // Execute with timeout
    await Promise.race([
      this.invokeToolImplementation(call.name, call.args),
      new Promise((_, reject) => setTimeout(() => reject(new Error('Tool execution timeout')), tool.maxExecutionTime))
    ]);
  }

  private async requestHumanApproval(call: { name: string; args: Record<string, any> }, sessionId: string): Promise<boolean> {
    // In production, this routes to a review queue or webhook
    console.log(`[HUMAN REVIEW] Tool: ${call.name} | Args: ${JSON.stringify(call.args)}`);
    return true; // Placeholder for async approval flow
  }

  private async invokeModel(prompt: string): Promise<string> {
    // Replace with actual LLM provider SDK
    return `Simulated model response for: ${prompt.slice(0, 50)}...`;
  }

  private async invokeToolImplementation(name: string, args: Record<string, any>): Promise<void> {
    // Actual tool execution logic
  }

  private requiresToolExecution(output: string): boolean {
    return output.includes('TOOL_CALL:');
  }

  private extractToolCall(output: string): { name: string; args: Record<string, any> } {
    const match = output.match(/TOOL_CALL:\s*({[\s\S]*?})/);
    return match ? JSON.parse(match[1]) : { name: '', args: {} };
  }
}

Why This Architecture Works

Defense-in-Depth: Each layer validates before passing data forward. Input fails fast, prompts are bounded, outputs are sanitized, tools are schema-validated, and critical actions require approval.
Least Privilege: The tool registry explicitly defines what the model can do. Unregistered tools are rejected at the orchestration layer, not the model layer.
Auditability: Every transition point logs to an audit trail, satisfying compliance requirements for traceability and incident response.
Regulatory Alignment: Human-in-the-loop gates directly address EU AI Act Article 14. Rate limiting and output sanitization satisfy Article 15 resilience requirements. Context isolation and PII filtering align with GDPR Article 32.

Pitfall Guide

1. Regex-Only Prompt Filtering

Explanation: Relying on regular expressions to block phrases like "ignore instructions" or "repeat above" is ineffective. LLMs understand semantic variations, and attackers easily bypass pattern matching with paraphrasing or encoding. Fix: Implement input allowlisting combined with prompt boundary enforcement. Structure prompts so user input is injected into isolated variables, never concatenated into system instructions.

2. Treating LLM Output as Trusted Data

Explanation: Model responses are probabilistic and can be manipulated. Passing them directly to eval(), exec(), or DOM rendering creates remote code execution and XSS vulnerabilities. Fix: Always sanitize output before downstream consumption. Use strict type casting, HTML entity encoding, and sandboxed execution environments. Never assume model output matches expected schemas.

3. Over-Provisioning Tool Permissions

Explanation: Granting the model unrestricted access to internal APIs or system commands allows prompt injection to cascade into full system compromise. Fix: Maintain an explicit tool registry with JSON schema validation. Apply least-privilege principles: read-only access by default, explicit approval for writes, and network isolation for external calls.

4. Ignoring Training Data Provenance

Explanation: Models trained on unverified or user-contributed datasets can inherit malicious patterns, backdoors, or biased behavior that triggers under specific conditions. Fix: Enforce cryptographic checksums on training datasets. Implement source allowlisting, data lineage tracking, and automated toxicity/bias scanning before ingestion.

5. Skipping Human Approval for State-Changing Actions

Explanation: Autonomous agents that modify databases, send emails, or execute system commands without oversight violate regulatory requirements and create massive blast radius. Fix: Implement explicit approval gates for any action that alters state, accesses PII, or triggers external workflows. Log all requests and maintain an audit trail for compliance reviews.

6. Assuming Rate Limiting Prevents All DoS

Explanation: Traditional rate limiting blocks high-frequency requests but does not protect against compute-intensive prompts designed to exhaust GPU/CPU resources. Fix: Combine request rate limits with input length caps, token consumption quotas, and inference timeout thresholds. Monitor compute utilization metrics and implement adaptive throttling based on resource pressure.

Production Bundle

Action Checklist

Input Validation: Enforce strict allowlists and length limits on all user queries before prompt construction
Prompt Boundaries: Isolate user input from system instructions using structured placeholders and type constraints
Output Sanitization: Strip executable patterns, HTML/JS, and prompt leakage artifacts before downstream processing
Tool Registry: Maintain an explicit allowlist of tools with JSON schema validation and least-privilege permissions
Human Oversight: Implement approval gates for all state-changing, data-modifying, or external system actions
Rate Limiting & Quotas: Apply request throttling, token consumption caps, and inference timeouts to prevent compute exhaustion
Training Data Verification: Enforce checksum validation, source allowlisting, and provenance tracking for all ingestion pipelines
Audit Logging: Record every input, output, tool call, and approval decision for compliance and incident response

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Internal knowledge chatbot	Strict prompt boundaries + output sanitization + read-only tool access	Low blast radius, internal users only, compliance focus on data confidentiality	Low
Customer-facing support agent	Input allowlisting + human approval for account actions + rate limiting + audit logging	External attack surface, regulatory exposure, requires traceability	Medium
Autonomous workflow orchestrator	Tool schema enforcement + human-in-the-loop gates + compute quotas + training data verification	High-risk state changes, EU AI Act Article 14/15 compliance, maximum blast radius	High
Fine-tuned domain model	Cryptographic dataset checksums + provenance tracking + bias/toxicity scanning + isolated inference	Training pipeline integrity, IP protection, long-term model stability	Medium-High

Configuration Template

// security-config.ts
export const LLM_SECURITY_CONFIG = {
  input: {
    maxLength: 500,
    allowedPattern: /^[a-zA-Z0-9\s.,!?-]+$/,
    sessionIdValidation: 'uuid',
  },
  prompt: {
    systemInstruction: 'You are a technical assistant. Answer using only provided context. Never execute commands.',
    userPlaceholder: '{{USER_QUERY}}',
    contextWindowLimit: 4000,
  },
  tools: {
    registry: {
      search_knowledge_base: {
        permissions: 'read',
        schema: { keyword: 'string', limit: 'number' },
        requiresApproval: false,
        timeoutMs: 3000,
      },
      update_configuration: {
        permissions: 'write',
        schema: { recordId: 'string', payload: 'object' },
        requiresApproval: true,
        timeoutMs: 5000,
      },
    },
    defaultPolicy: 'deny',
  },
  output: {
    sanitizeHtml: true,
    blockExecutablePatterns: true,
    maskSensitiveData: true,
  },
  rateLimiting: {
    maxRequestsPerMinute: 30,
    maxTokensPerMinute: 10000,
    computeTimeoutMs: 10000,
  },
  compliance: {
    auditTrailEnabled: true,
    humanOversightRequired: true,
    regulatoryMapping: ['EU_AI_ACT_ART_14', 'EU_AI_ACT_ART_15', 'GDPR_ART_32'],
  },
};

Quick Start Guide

Initialize the Orchestrator: Import the SecureLLMOrchestrator class and inject your LLM provider SDK. Replace the placeholder invokeModel method with your actual inference endpoint.
Configure Tool Registry: Define your allowed tools in ALLOWED_TOOLS with strict JSON schemas, permission levels, and approval requirements. Start with read-only operations and gradually expand.
Deploy Rate Limiting & Quotas: Enable the built-in rate limiter and set token consumption thresholds based on your provider's pricing and infrastructure capacity. Monitor compute utilization during peak loads.
Enable Audit Logging: Route the auditTrail array to your centralized logging system. Tag each entry with session IDs, timestamps, and compliance markers for regulatory reporting.
Test Attack Vectors: Run prompt injection payloads, tool misuse attempts, and compute exhaustion simulations against your deployment. Validate that boundaries hold and approval gates trigger correctly.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back