Testing Sigma Rules Against Local Logs Without a SIEM

Current Situation Analysis

Validating Sigma rules against real telemetry traditionally requires deploying a full SIEM stack, creating a significant bottleneck in the rule development lifecycle. The standard workflow involves spinning up a Wazuh manager, routing Sysmon JSONL via Filebeat, SSHing into the manager, and manually tailing alerts.log. This approach introduces a ~4-minute round-trip time per rule edit, making iterative false-positive checking and regex validation highly inefficient.

Failure modes frequently stem from environmental overhead rather than rule logic: minor typos in field names, mismatched event schemas across platforms, and the lack of immediate feedback. Traditional SIEM deployment is architected for production monitoring, not rapid rule authoring. The cognitive load of managing agents, log shippers, and remote log aggregation obscures the core task: verifying whether a detection condition actually matches local event data. Without a lightweight, local validation layer, security engineers waste hours on infrastructure plumbing instead of detection engineering.

WOW Moment: Key Findings

Local execution of compiled Sigma detection trees eliminates deployment overhead and delivers immediate validation feedback. By shifting from a remote SIEM pipeline to an in-memory expression evaluator, iteration speed increases by two orders of magnitude while maintaining detection fidelity.

Approach	Round-Trip Time	Setup Overhead	Iteration Throughput
Traditional SIEM (Wazuh + Filebeat)	~4 minutes	High (VM, agents, SSH, log routing)	~15 iterations/hour
SIEMForge Local CLI	~2 seconds	Zero (Python CLI only)	~1800 iterations/hour

Key Findings:

A 4823-event Sysmon JSONL dump is fully scanned in under 1 second.
End-to-end rule validation (edit → scan → alert correlation) drops from ~4 minutes to ~2 seconds.
Field normalization via alias mapping eliminates the need to maintain platform-specific rule variants.
Expression tree compilation at load time ensures O(1) per-event evaluation, making local scanning production-viable for rule testing.

Core Solution

The solution replaces naive regex matching with a compiled expression tree architecture. Sigma detection blocks are parsed once during load time into callable Python functions. Each event dictionary is then evaluated against these pre-compiled callables, returning a boolean match result. This approach natively supports Sigma's condition language (and, or, not, 1 of selection*, all of them) and correctly handles field modifiers and wildcards.

def compile_selection(selection: dict) -> Callable[[dict], bool]:
    matchers = []
    for key, value in selection.items():
        field, _, modifier = key.partition("|")
        matcher = build_field_matcher(field, modifier or "equals", value)
        matchers.append(matcher)
    return lambda event: all(m(event) for m in matchers)

build_field_matcher handles contains, startswith, endswith, and re modifiers. Wildcards in raw values (*-ep bypass*) get translated to a contains check at compile time.

Architecture Decisions:

Expression Tree over Regex: Bare boolean parsers fail on Sigma's 1 of selection* syntax and complex wildcards. Compiling detection logic into a callable tree ensures accurate condition evaluation and supports future operator extensions.
Event Normalization over Rule Rewriting: Cross-platform field naming inconsistencies (e.g., process.command_line in Wazuh vs CommandLine in raw EVTX) are resolved via a field_aliases.yml mapping layer. The scanner attempts each alias when the canonical field is missing, preserving rule portability.

CommandLine:
  - process.command_line
  - data.win.eventdata.commandLine
  - winlog.event_data.CommandLine

The scanner outputs structured alerts correlating rule metadata, MITRE technique IDs, and triggering event details, enabling rapid validation without SIEM deployment. The same compiled tree architecture supports multi-backend emission (Splunk, Elastic/Kibana), making the local scanner a reusable core for rule conversion pipelines.

Pitfall Guide

Naive Regex/Boolean Parsing for Sigma Conditions: Attempting to parse Sigma conditions with basic regex or simple and/or/not logic fails on advanced operators like 1 of selection* and all of them. Best Practice: Compile detection logic into an expression tree or leverage the official pySigma reference backend to handle condition parsing correctly.
Ignoring Cross-Platform Field Naming Conventions: Sigma rules written against raw EVTX field names will silently fail against Wazuh, Splunk, or Elastic-formatted logs. Best Practice: Implement a field alias/normalization layer (field_aliases.yml) that maps canonical Sigma fields to platform-specific variants, rather than maintaining duplicate rule sets.
Delaying Test Data and Sample Log Creation: Validating false positives without technique-specific log samples leads to guesswork and delayed feedback. Best Practice: Maintain a samples/ directory containing process injection, service installation, user creation, and baseline clean logs from commit one.
Reinventing Sigma Parsing Logic: Building a custom condition parser from scratch introduces maintenance debt and compatibility drift with the Sigma specification. Best Practice: Wrap pySigma for detection parsing and focus custom development on the scanner, normalization, and emit layers.
Underestimating Syslog Format Variability: Hand-rolled tokenizers break on RFC 3164 vs RFC 5424 timestamp differences and non-standard host formats (e.g., pfSense). Best Practice: Replace custom string splitting with robust parsing libraries like pyparsing or python-dateutil to handle timestamp and field extraction reliably.
Skipping CI/Testing in Early Development: Late-stage regressions in converter outputs (e.g., Splunk syntax breaks on Windows) waste debugging time and erode trust in the toolchain. Best Practice: Implement automated test suites (138+ tests) and CI pipelines from v1.0 to catch parser and emitter regressions on every push.

Deliverables

Blueprint: Local Sigma Validation Architecture
A reference diagram detailing the expression tree compilation flow, field alias resolution pipeline, and multi-backend emission strategy. Includes memory footprint analysis and event throughput benchmarks for local scanning.
Checklist: Rule Development & Testing Workflow
Step-by-step validation sequence: environment setup → alias mapping verification → expression tree compilation → sample log execution → false-positive triage → CI integration. Covers pre-deployment rule hygiene and cross-platform compatibility checks.
Configuration Templates
- field_aliases.yml: Pre-populated mapping for Sysmon, Wazuh, Splunk, and Elastic field variants.
- CLI Scan Invocation: python -m siemforge --scan samples/events.jsonl with output formatting flags.
- CI Pipeline Snippet: GitHub Actions workflow for automated rule parsing, sample log execution, and converter regression testing.