I let the AI write the report, not decide the alerts
Deterministic Detection, Generative Reporting: A Hybrid Architecture for Auditable SOC Triage
Current Situation Analysis
Security operations teams are rapidly integrating large language models into triage workflows, but a fundamental architectural mismatch is causing reliability failures. The industry treats LLMs as autonomous decision engines, feeding them raw telemetry and expecting consistent, auditable verdicts. In practice, this approach introduces probabilistic behavior into a domain that demands deterministic certainty.
The core problem is context drift and prompt sensitivity. Feed identical log batches to the same model with minor temperature variations or different system prompts, and you will receive divergent severity ratings, conflicting MITRE ATT&CK mappings, and inconsistent next-step recommendations. For compliance, incident response, and executive reporting, this variability is unacceptable. Auditors require reproducible findings. Engineers require version-controlled logic. LLMs provide neither when tasked with detection.
This issue is frequently misunderstood because early demos mask the underlying fragility. Vendors showcase polished chat interfaces that appear to "analyze" logs, but the heavy lifting is often done by hidden deterministic parsers or heavily constrained output schemas. When teams attempt to replicate this by prompting raw logs directly, they encounter predictable failure modes: hallucinated event IDs, misclassification of benign system processes (e.g., flagging svchost.exe as a living-off-the-land binary), malformed JSON responses that break downstream parsers, and context window exhaustion during high-volume ingestion.
The data-backed reality is straightforward. LLMs excel at natural language generation and summarization. They struggle with stateful counting, threshold evaluation, and strict logical branching. Security triage requires all three. The solution is not to abandon AI, but to reposition it. By decoupling detection logic from report generation, teams can achieve 100% reproducibility for findings while leveraging generative models exclusively for analyst-facing documentation.
WOW Moment: Key Findings
The architectural split between deterministic detection and generative reporting fundamentally changes how security pipelines behave in production. The following comparison illustrates the operational impact of treating AI as a writer rather than a detector.
| Approach | Consistency | Auditability | False Positive Rate | Implementation Complexity |
|---|---|---|---|---|
| AI-Driven Detection | ~65-80% (varies by model/temp) | Low (prompt-dependent) | High (hallucination/context drift) | Low initially, high maintenance |
| Deterministic + AI Reporting | 100% | High (code-reviewed rules) | Low (tuned thresholds) | Moderate upfront, low maintenance |
This finding matters because it shifts the security pipeline from a black-box experiment to a version-controlled engineering discipline. Deterministic rules can be unit-tested, peer-reviewed, and tracked in Git. Risk scores become mathematical functions rather than model guesses. The AI layer becomes a pluggable output formatter that can be swapped between providers (e.g., Claude, GPT-4, local Ollama instances) without altering the underlying findings or compliance mappings. Teams gain predictable triage, reduced alert fatigue, and a clear separation between signal generation and narrative delivery.
Core Solution
Building a hybrid triage pipeline requires strict layering. Each stage must have a single responsibility, explicit input/output contracts, and zero cross-contamination between logic and language generation.
Step 1: Unified Event Schema
Security telemetry arrives in fragmented formats. Windows Security logs, Sysmon events, Linux authentication records, and cloud audit trails use different field names, timestamp formats, and severity taxonomies. The first architectural decision is to normalize all inputs into a strict intermediate representation before any evaluation occurs.
interface NormalizedEvent {
eventId: string;
timestamp: Date;
sourceIp: string | null;
destinationIp: string | null;
user: string;
processName: string;
commandLine: string | null;
rawProvider: 'windows-security' | 'sysmon' | 'linux-auth' | 'generic-json';
}
Normalization strips provider-specific noise and aligns fields to a common vocabulary. This enables rules to evaluate events without conditional branching for log source types.
Step 2: Deterministic Rule Engine
Detection logic must be pure functions. Each rule accepts an array of normalized events, evaluates stateful conditions, and returns an array of evidence strings. Empty arrays indicate no match. This design enables unit testing, deterministic execution, and clear audit trails.
type Evidence = string;
interface DetectionRule {
id: string;
title: string;
severity: 'low' | 'medium' | 'high' | 'critical';
mitreTechniques: string[];
evaluate(events: NormalizedEvent[]): Evidence[];
}
const consecutiveFailedLoginsFollowedBySuccess: DetectionRule = {
id: 'auth-brute-force-success-chain',
title: 'Successful authentication following repeated failures',
severity: 'critical',
mitreTechniques: ['T1110', 'T1078'],
evaluate(events) {
const failureCounts = new Map<string, number>();
events.forEach(evt => {
if (evt.eventId === 'auth-failure' && evt.sourceIp) {
failureCounts.set(evt.sourceIp, (failureCounts.get(evt.sourceIp) ?? 0) + 1);
}
});
const evidence: Evidence[] = [];
events.forEach(evt => {
if (evt.eventId === 'auth-success' && evt.sourceIp) {
const failures = failureCounts.get(evt.sourceIp) ?? 0;
if (failures >= 5) {
evidence.push(
`Account "${evt.user}" authenticated successfully from ${evt.sourceIp} after ${failures} consecutive failures.`
);
}
}
});
return evidence;
}
};
The rule counts failures per source IP, then scans for subsequent successes. The threshold (>= 5) is explicit, testable, and version-controlled. Background scanning noise is filtered because isolated failures without a follow-up success do not trigger the rule.
Step 3: Risk Scoring & Aggregation
Severity labels are insufficient for triage prioritization. A deterministic scoring algorithm aggregates rule matches, applies weight multipliers, and produces a normalized 0-100 risk index.
function calculateRiskScore(
matchedRules: DetectionRule[],
eventVolume: number
): number {
const severityWeights: Record<string, number> = {
low: 10,
medium: 25,
high: 50,
critical: 80
};
let rawScore = 0;
matchedRules.forEach(rule => {
rawScore += severityWeights[rule.severity];
});
// Diminishing returns for high event volume to prevent score inflation
const volumeModifier = Math.min(1.0, 1 + Math.log10(eventVolume) * 0.15);
const finalScore = Math.min(100, Math.round(rawScore * volumeModifier));
return finalScore;
}
This function ensures that multiple low-severity matches do not artificially inflate risk, while critical findings dominate the score. The logarithmic volume modifier prevents alert storms from skewing prioritization.
Step 4: Generative Reporting Layer
The AI component sits at the terminal stage. It receives the structured findings, risk score, and MITRE mappings, then generates analyst-ready prose. The prompt template is strict, and the model is treated as a text formatter, not a logic engine.
interface TriageReportInput {
riskScore: number;
findings: { ruleId: string; title: string; evidence: string[] }[];
timeWindow: string;
}
function buildReportPrompt(input: TriageReportInput): string {
return `
Generate a concise security triage summary based on the following structured findings.
Do not invent events, modify severity, or alter MITRE mappings.
Risk Score: ${input.riskScore}/100
Time Window: ${input.timeWindow}
Findings:
${input.findings.map(f => `- ${f.title} (${f.ruleId})\n Evidence: ${f.evidence.join('; ')}`).join('\n')}
Output format:
1. Executive Summary (2-3 sentences)
2. Prioritized Findings (bullet list with next steps)
3. MITRE ATT&CK Context (brief mapping explanation)
`;
}
Swapping the underlying provider requires only changing the HTTP client or SDK call. The findings, scores, and technique IDs remain identical. This abstraction enables cost optimization, latency tuning, and compliance alignment without rewriting detection logic.
Pitfall Guide
1. Feeding Raw Logs Directly to LLMs
Explanation: Raw telemetry contains provider-specific formatting, timestamps, and noise that consume context windows and confuse pattern recognition. LLMs will attempt to parse structure they were not trained to validate. Fix: Always normalize logs into a strict intermediate schema before evaluation. Use deterministic parsers for EVTX, Sysmon, and auth.log formats.
2. Using Prompts for Stateful Logic
Explanation: LLMs lack persistent memory across inference calls. Attempting to count failures, track time windows, or evaluate thresholds via prompts results in inconsistent outputs and hidden state bugs. Fix: Move all counting, aggregation, and threshold evaluation to TypeScript. Treat the model as a stateless text generator.
3. Ignoring Schema Normalization
Explanation: Rules that check for event_id in Windows logs but eventType in Linux logs will fail silently or produce false negatives. Inconsistent field naming breaks deterministic evaluation.
Fix: Define a canonical event interface. Write adapter functions for each log source that map provider fields to the canonical schema before rules execute.
4. Skipping Rule Unit Tests
Explanation: Security rules drift when thresholds change or new log formats are introduced. Without tests, regressions go unnoticed until production incidents occur. Fix: Maintain a synthetic event dataset covering edge cases. Run rules through a test harness that verifies evidence output, severity assignment, and MITRE mapping for each scenario.
5. Mixing Scoring with Generation
Explanation: Asking an LLM to calculate a risk score introduces variance. Two identical runs may yield different numerical priorities, breaking automated escalation workflows. Fix: Compute scores deterministically. Pass the final numerical value to the AI solely for contextual explanation in the report.
6. Assuming Native MITRE Understanding
Explanation: Models map techniques inconsistently. T1110 might be labeled "Credential Access" in one run and "Brute Force" in another. Compliance audits require exact technique IDs. Fix: Hardcode MITRE ATT&CK IDs in rule definitions. Let the AI reference the provided IDs rather than inferring them.
7. Over-Engineering the Prompt Template
Explanation: Complex prompts with conditional branching, JSON schema enforcement, and multi-step reasoning increase latency and failure rates. LLMs perform best with clear, single-purpose instructions. Fix: Keep prompts declarative. Provide structured data, specify output sections, and forbid logical modifications. Validate AI output against a schema before rendering.
Production Bundle
Action Checklist
- Define a canonical event interface that abstracts provider-specific log formats
- Implement pure-function detection rules that return evidence arrays
- Build a deterministic scoring algorithm with configurable severity weights
- Create a strict prompt template that forbids logical modifications or hallucinations
- Unit-test all rules against synthetic event batches covering edge cases
- Abstract the AI client behind an interface to enable provider swapping
- Log all rule evaluations and AI responses for audit trail reconstruction
- Implement schema validation on AI output before rendering to analysts
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| High-volume triage (>10k events/day) | Deterministic rules + lightweight AI summarization | LLMs become cost-prohibitive at scale; deterministic engines handle throughput efficiently | Low inference cost, high compute for normalization |
| Compliance-heavy environments | Code-reviewed rules with hardcoded MITRE IDs | Auditors require reproducible findings and exact technique mapping | Moderate engineering overhead, zero compliance risk |
| Rapid prototyping / PoC | AI-assisted detection with strict output schemas | Faster iteration, but requires heavy prompt engineering and validation | High token cost, inconsistent reliability |
| Cost-constrained deployments | Local deterministic engine + open-source LLM | Eliminates API fees while maintaining auditability | Higher infrastructure cost, lower operational expense |
Configuration Template
// triage.config.ts
import { DetectionRule } from './types';
import { consecutiveFailedLoginsFollowedBySuccess } from './rules/auth-chain';
import { encodedPowerShellExecution } from './rules/process-injection';
import { logClearingActivity } from './rules/defense-evasion';
export const triageConfig = {
scoring: {
severityWeights: { low: 10, medium: 25, high: 50, critical: 80 },
maxScore: 100,
volumeDecayFactor: 0.15
},
rules: [
consecutiveFailedLoginsFollowedBySuccess,
encodedPowerShellExecution,
logClearingActivity
] as DetectionRule[],
ai: {
provider: 'claude', // 'openai' | 'claude' | 'ollama'
model: 'claude-3-5-sonnet-20241022',
temperature: 0.2,
maxTokens: 1024,
outputSchema: {
executiveSummary: 'string',
prioritizedFindings: 'array',
mitreContext: 'string'
}
},
normalization: {
supportedProviders: ['windows-security', 'sysmon', 'linux-auth', 'generic-json'],
timestampFormat: 'ISO_8601',
ipValidation: true
}
};
Quick Start Guide
- Initialize the project structure: Create a TypeScript workspace with strict mode enabled. Define the canonical event interface and rule type signatures.
- Load and normalize sample telemetry: Import a test log batch (Windows 4688, Sysmon 1, or Linux auth.log). Run it through the normalization adapter to produce
NormalizedEvent[]. - Execute the rule engine: Pass the normalized events to the configured rule registry. Collect evidence arrays, map MITRE techniques, and calculate the deterministic risk score.
- Generate the triage report: Feed the structured findings into the AI prompt builder. Submit the request to your configured provider. Validate the response against the output schema and render the analyst summary.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
