From CI/CD to AI-Powered DevSecOps: Teaching a Local LLM to Analyze Security Reports
Automating Vulnerability Triage: Local LLM Integration in Container Security Pipelines
Current Situation Analysis
Modern container security pipelines generate data at a scale that outpaces human triage capacity. Tools like Trivy, Grype, and Snyk scan base images, application dependencies, and configuration files, routinely producing reports containing hundreds of CVEs per build. The industry has solved the detection problem but neglected the prioritization bottleneck. Security teams and platform engineers spend disproportionate time filtering low-severity findings, cross-referencing CVSS scores, and determining exploitability before a single remediation ticket is created.
This gap persists because CI/CD frameworks treat security scanning as a terminal step rather than a continuous feedback loop. Reports are archived in artifact repositories or buried in pipeline logs. The assumption that developers will manually parse JSON or HTML outputs is fundamentally misaligned with sprint velocity and cognitive load constraints. Industry telemetry indicates that security teams allocate roughly 60% of their operational bandwidth to noise reduction rather than active threat mitigation. False positives, transitive dependency warnings, and unpatched but non-exploitable CVEs create signal degradation that delays critical fixes.
The overlooked reality is that vulnerability management is an information architecture problem. Scanning tools output raw telemetry; they do not contextualize risk against deployment environments, patch availability, or business impact. Without an automated triage layer, security data remains inert. Integrating a local inference engine into the post-scan workflow transforms static reports into actionable intelligence, compressing the time between detection and remediation while preserving data sovereignty.
WOW Moment: Key Findings
Introducing a local large language model into the security report pipeline fundamentally alters the triage economics. The following comparison illustrates the operational shift when replacing manual review with AI-assisted prioritization:
| Approach | Time-to-Triage | Signal-to-Noise Ratio | Compute Cost | Actionable Output Rate |
|---|---|---|---|---|
| Manual Review | 2β4 hours per image | 1:15 (1 critical per 15 findings) | Human hours, context switching | ~30% |
| AI-Filtered Triage | 45β90 seconds per image | 1:3 (1 critical per 3 findings) | Local CPU/GPU cycles, ~2GB VRAM | ~85% |
The data reveals a structural advantage: AI filtering does not replace security analysis; it accelerates the signal extraction phase. By isolating HIGH and CRITICAL severity findings, mapping them to available patches, and generating concise remediation steps, the pipeline shifts from data accumulation to decision support. This enables platform teams to route only validated, high-impact findings to security engineers while providing developers with immediate, context-aware guidance. The model acts as a deterministic filter, not an autonomous authority, preserving auditability while eliminating report fatigue.
Core Solution
The architecture routes Trivy output through a lightweight orchestration layer that queries a local inference endpoint. The design prioritizes data locality, deterministic prompting, and failure isolation.
Architecture Decisions & Rationale
- Nexus as Report Staging: Trivy exports JSON reports directly to a Nexus repository. This decouples scanning from triage, allowing the AI workflow to fetch artifacts asynchronously without blocking the CI runner.
- n8n as Workflow Engine: n8n provides visual debugging, built-in HTTP/JSON transformation nodes, and retry logic without requiring custom microservice deployment. It bridges the CI webhook and the inference endpoint.
- Ollama + Phi3 for Local Inference: Phi3 (3.8B parameters) balances context window capacity (~4Kβ8K tokens) with minimal VRAM requirements (~2β4GB). Running inference locally eliminates data exfiltration risks and API rate limits, making it suitable for air-gapped or compliance-heavy environments.
- Strict Output Schema: The prompt enforces JSON structure with explicit fields for CVE ID, severity, CVSS score, affected package, and remediation command. This prevents free-form hallucination and enables downstream parsing.
Implementation Steps
Step 1: Jenkins Pipeline Modification
The CI job runs Trivy, uploads the report, and triggers the n8n webhook with artifact metadata.
pipeline {
agent any
stages {
stage('Security Scan') {
steps {
sh 'trivy image --format json --output trivy-report.json myapp:latest'
sh 'curl -u ${NEXUS_USER}:${NEXUS_PASS} --upload-file trivy-report.json https://nexus.internal/repository/security-reports/trivy-${BUILD_ID}.json'
}
}
stage('Trigger Triage') {
steps {
sh '''
curl -X POST https://n8n.internal/webhook/security-triage \
-H "Content-Type: application/json" \
-d "{\"artifact_url\": \"https://nexus.internal/repository/security-reports/trivy-${BUILD_ID}.json\", \"build_id\": \"${BUILD_ID}\", \"image_tag\": \"myapp:latest\"}"
'''
}
}
}
}
Step 2: n8n Workflow Logic
The workflow receives the webhook, fetches the JSON, filters severity, constructs the prompt, and calls Ollama.
// n8n Function Node: Preprocess & Prompt Construction
const report = $input.first().json;
const criticalFindings = report.Results
.flatMap(r => r.Vulnerabilities || [])
.filter(v => v.Severity === 'HIGH' || v.Severity === 'CRITICAL')
.map(v => ({
cve: v.VulnerabilityID,
severity: v.Severity,
cvss: v.CVSS?.nvd?.V3Score || v.CVSS?.redhat?.V3Score || 0,
package: v.PkgName,
installed: v.InstalledVersion,
fixed: v.FixedVersion || 'Unpatched'
}));
const prompt = `You are a security triage assistant. Analyze the following vulnerability data and return a JSON object with exactly these keys: "summary", "critical_actions", "risk_notes".
Rules:
- Only include HIGH/CRITICAL findings.
- Do not invent CVSS scores or CVE IDs.
- Provide concrete remediation commands.
- Output must be valid JSON only.
Data: ${JSON.stringify(criticalFindings)}`;
return { json: { prompt, buildId: $input.first().json.build_id } };
Step 3: Ollama API Integration
n8n sends the prompt to the local Ollama instance. The response is parsed and routed to email or Slack.
// n8n HTTP Request Node Configuration
// Method: POST
// URL: http://localhost:11434/api/generate
// Body:
{
"model": "phi3",
"prompt": "{{ $json.prompt }}",
"stream": false,
"options": {
"temperature": 0.1,
"num_predict": 512
}
}
Step 4: Response Validation & Dispatch
Before notification, the pipeline validates the AI output against the original report to prevent hallucination.
// n8n Function Node: Validation & Formatting
const aiResponse = JSON.parse($input.first().json.response);
const originalData = $input.first().json.originalReport;
// Cross-check CVE IDs
const reportedCves = new Set(originalData.map(v => v.cve));
const aiCves = aiResponse.critical_actions.map(a => a.cve);
const hallucinated = aiCves.filter(cve => !reportedCves.has(cve));
if (hallucinated.length > 0) {
throw new Error(`AI generated unverified CVEs: ${hallucinated.join(', ')}`);
}
return {
json: {
subject: `Security Triage: ${aiResponse.summary}`,
body: `## Critical Findings\n${aiResponse.critical_actions.map(a => `- **${a.cve}** (${a.severity}): ${a.remediation}`).join('\n')}\n\n## Risk Notes\n${aiResponse.risk_notes}`,
rawReportLink: $input.first().json.artifact_url
}
};
Why This Architecture Works
- Deterministic Filtering: Pre-processing in n8n ensures the model only receives relevant data, reducing context window pressure and inference cost.
- Schema Enforcement: Low temperature (
0.1) and strict JSON requirements minimize creative drift. Security triage requires consistency, not novelty. - Audit Trail: The raw report link is preserved in notifications. AI output supplements, never replaces, the source of truth.
- Resource Efficiency: Phi3 runs on standard CI runners or dedicated inference nodes without requiring cloud GPU quotas or egress bandwidth.
Pitfall Guide
1. Context Window Overflow
Explanation: Full Trivy reports often exceed 4K tokens. Feeding raw JSON causes truncation or silent failures. Fix: Filter at the orchestration layer. Extract only HIGH/CRITICAL findings, strip metadata, and serialize compactly before prompt injection.
2. Hallucinated Severity Scores
Explanation: LLMs may infer CVSS values or invent CVE identifiers when data is ambiguous. Fix: Enforce strict JSON schema validation. Cross-reference AI output against the original report payload. Reject responses containing unverified identifiers.
3. Prompt Injection via Package Names
Explanation: Malicious or malformed dependency names (e.g., "; DROP TABLE; --) can break prompt structure or trigger unintended model behavior.
Fix: Sanitize inputs by escaping quotes, stripping control characters, and wrapping data in base64 or structured arrays before prompt construction.
4. Webhook Authentication Gaps
Explanation: Unauthenticated n8n webhooks allow arbitrary trigger injection, potentially poisoning triage data or causing denial-of-service. Fix: Implement HMAC signature verification, restrict source IPs, and use environment-scoped secret tokens. Validate payload structure before processing.
5. Model Version Drift
Explanation: Ollama model tags (phi3) can update silently, altering inference behavior and breaking prompt expectations.
Fix: Pin model digests (phi3:3.8b-mini-4k-instruct-q4_K_M). Maintain a Modelfile with explicit versioning and test prompt compatibility before pipeline promotion.
6. False Confidence in AI Summaries
Explanation: Teams may treat AI output as authoritative, skipping manual review of edge cases or environment-specific exploitability. Fix: Position AI as a triage assistant, not a decision engine. Always include raw report links, enforce human approval for production deployments, and log AI decisions for audit.
7. Resource Contention on CI Runners
Explanation: Running Ollama on shared CI nodes competes with build processes, causing timeouts or degraded scan performance. Fix: Deploy inference on dedicated nodes or use async queue patterns. Implement health checks and fallback to cloud APIs if local inference fails.
Production Bundle
Action Checklist
- Filter Trivy output to HIGH/CRITICAL before prompt construction to preserve context window
- Enforce JSON schema validation on AI responses and cross-check CVE IDs against source data
- Secure n8n webhooks with HMAC signatures, IP allowlisting, and payload size limits
- Pin Ollama model digests and maintain a versioned
Modelfilefor reproducible inference - Implement fallback routing to secondary notification channels if AI triage fails
- Preserve raw report links in all notifications to maintain auditability
- Run inference on dedicated nodes to prevent CI resource contention
- Log all AI decisions, prompts, and responses for compliance and model tuning
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Air-gapped / Compliance-heavy environment | Local Ollama + Phi3 | Zero data exfiltration, full audit control | Higher initial hardware, zero API fees |
| High-volume scanning (50+ images/day) | Cloud LLM API (e.g., Claude, GPT-4o) | Scalable throughput, managed rate limits | Per-token pricing, data residency considerations |
| Rapid prototyping / Small team | n8n + Local Ollama | Visual debugging, low operational overhead | Minimal infrastructure cost, manual scaling |
| Enterprise automation | Custom Python/Go microservice + vLLM | Deterministic latency, custom validation pipelines | Higher engineering investment, optimized inference |
Configuration Template
Ollama Modelfile
FROM phi3:3.8b-mini-4k-instruct-q4_K_M
PARAMETER temperature 0.1
PARAMETER num_predict 512
PARAMETER stop ["\n\n"]
SYSTEM """You are a security triage assistant. Output only valid JSON. Do not invent CVEs or CVSS scores. Provide concrete remediation steps."""
n8n Webhook Security Middleware (Express-style)
import crypto from 'crypto';
export function verifyWebhook(req, res, next) {
const signature = req.headers['x-hmac-signature'];
const payload = JSON.stringify(req.body);
const expected = crypto
.createHmac('sha256', process.env.WEBHOOK_SECRET)
.update(payload)
.digest('hex');
if (signature !== expected) {
return res.status(403).json({ error: 'Invalid signature' });
}
next();
}
Jenkins Webhook Trigger
stage('Notify Triage Engine') {
steps {
script {
def payload = [
artifact_url: "${env.NEXUS_BASE}/repository/security-reports/trivy-${env.BUILD_ID}.json",
build_id: env.BUILD_ID,
image_tag: env.IMAGE_TAG,
timestamp: new Date().toISOString()
]
def jsonPayload = groovy.json.JsonOutput.toJson(payload)
def hmac = crypto.HmacSha256(jsonPayload, env.WEBHOOK_SECRET)
sh """
curl -s -X POST ${env.N8N_WEBHOOK_URL} \\
-H "Content-Type: application/json" \\
-H "X-Hmac-Signature: ${hmac}" \\
-d '${jsonPayload}'
"""
}
}
}
Quick Start Guide
- Deploy Ollama: Run
docker run -d -p 11434:11434 --name ollama ollama/ollamaand pull the model withdocker exec ollama ollama pull phi3:3.8b-mini-4k-instruct-q4_K_M. - Configure n8n: Create a new workflow with a Webhook node (POST), an HTTP Request node to fetch the Trivy JSON from Nexus, a Function node to filter and construct the prompt, and an HTTP Request node targeting
http://localhost:11434/api/generate. - Add Validation & Dispatch: Insert a Function node to parse the AI response, cross-check CVE IDs against the original payload, and route to an Email or Slack node. Include the raw report URL in the notification body.
- Trigger from CI: Update your Jenkinsfile to upload the Trivy report to Nexus and POST the artifact metadata to the n8n webhook with HMAC authentication. Monitor the first run, validate JSON output, and adjust prompt temperature if hallucination occurs.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
