Testing Sigma Rules Against Local Logs Without a SIEM
Testing Sigma Rules Against Local Logs Without a SIEM
Current Situation Analysis
Validating Sigma rules against real telemetry traditionally requires deploying a full SIEM stack, creating a significant bottleneck in the rule development lifecycle. The standard workflow involves spinning up a Wazuh manager, routing Sysmon JSONL via Filebeat, SSHing into the manager, and manually tailing alerts.log. This approach introduces a ~4-minute round-trip time per rule edit, making iterative false-positive checking and regex validation highly inefficient.
Failure modes frequently stem from environmental overhead rather than rule logic: minor typos in field names, mismatched event schemas across platforms, and the lack of immediate feedback. Traditional SIEM deployment is architected for production monitoring, not rapid rule authoring. The cognitive load of managing agents, log shippers, and remote log aggregation obscures the core task: verifying whether a detection condition actually matches local event data. Without a lightweight, local validation layer, security engineers waste hours on infrastructure plumbing instead of detection engineering.
WOW Moment: Key Findings
Local execution of compiled Sigma detection trees eliminates deployment overhead and delivers immediate validation feedback. By shifting from a remote SIEM pipeline to an in-memory expression evaluator, iteration speed increases by two orders of magnitude while maintaining detection fidelity.
| Approach | Round-Trip Time | Setup Overhead | Iteration Throughput |
|---|---|---|---|
| Traditional SIEM (Wazuh + Filebeat) | ~4 minutes | High (VM, agents, SSH, log routing) | ~15 iterations/hour |
| SIEMForge Local CLI | ~2 seconds | Zero (Python CLI only) | ~1800 iterations/hour |
Key Findings:
- A 4823-event Sysmon JSONL dump is fully scanned in under 1 second.
- End-to-end rule validation (edit β scan β alert correlation) drops from ~4 minutes to ~2 seconds.
- Field normalization via alias mapping eliminates the need to maintain platform-specific rule variants.
- Expression tree compilation at load time ensures O(1) per-event evaluation, making local scanning production-viable for rule testing.
Core Solution
The solution replaces naive regex matching with a compiled expression tree architecture. Sigma detection blocks are parsed once during load time into callable Python functions. Each event dictionary is then evaluated against these pre-compiled callables, returning a boolean match result. This approach natively supports Sigma's condition language (and, or, not, 1 of selection*, all of them) and correctly handles field modifiers and wildcards.
def compile_selection(selection: dict) -> Callable[[dict], bool]:
matchers = []
for key, value in selection.items():
field, _, modifier = key.partition("|")
matcher = build_field_matcher(field, modifier or "equals", value)
matchers.append(matcher)
return lambda event: all(m(event) for m in matchers)
build_field_matcher handles contains, startswith, endswith, and re modifiers. Wildcards in raw values (*-ep bypass*) get translated to a contains check at compile time.
Architecture Decisions:
- Expression Tree over Regex: Bare boolean parsers fail on Sigma's
1 of selection*syntax and complex wildcards. Compiling detection logic into a callable tree ensures accurate condition evaluation and supports future operator extensions. - Event Normalization over Rule Rewriting: Cross-platform field naming inconsistencies (e.g.,
process.command_linein Wazuh vsCommandLinein raw EVTX) are resolved via afield_aliases.ymlmapping layer. The scanner attempts each alias when the canonical field is missing, preserving rule portability.
CommandLine:
- process.command_line
- data.win.eventdata.commandLine
- winlog.event_data.CommandLine
The scanner outputs structured alerts correlating rule metadata, MITRE technique IDs, and triggering event details, enabling rapid validation without SIEM deployment. The same compiled tree architecture supports multi-backend emission (Splunk, Elastic/Kibana), making the local scanner a reusable core for rule conversion pipelines.
Pitfall Guide
- Naive Regex/Boolean Parsing for Sigma Conditions: Attempting to parse Sigma conditions with basic regex or simple
and/or/notlogic fails on advanced operators like1 of selection*andall of them. Best Practice: Compile detection logic into an expression tree or leverage the officialpySigmareference backend to handle condition parsing correctly. - Ignoring Cross-Platform Field Naming Conventions: Sigma rules written against raw EVTX field names will silently fail against Wazuh, Splunk, or Elastic-formatted logs. Best Practice: Implement a field alias/normalization layer (
field_aliases.yml) that maps canonical Sigma fields to platform-specific variants, rather than maintaining duplicate rule sets. - Delaying Test Data and Sample Log Creation: Validating false positives without technique-specific log samples leads to guesswork and delayed feedback. Best Practice: Maintain a
samples/directory containing process injection, service installation, user creation, and baseline clean logs from commit one. - Reinventing Sigma Parsing Logic: Building a custom condition parser from scratch introduces maintenance debt and compatibility drift with the Sigma specification. Best Practice: Wrap
pySigmafor detection parsing and focus custom development on the scanner, normalization, and emit layers. - Underestimating Syslog Format Variability: Hand-rolled tokenizers break on RFC 3164 vs RFC 5424 timestamp differences and non-standard host formats (e.g., pfSense). Best Practice: Replace custom string splitting with robust parsing libraries like
pyparsingorpython-dateutilto handle timestamp and field extraction reliably. - Skipping CI/Testing in Early Development: Late-stage regressions in converter outputs (e.g., Splunk syntax breaks on Windows) waste debugging time and erode trust in the toolchain. Best Practice: Implement automated test suites (138+ tests) and CI pipelines from v1.0 to catch parser and emitter regressions on every push.
Deliverables
- Blueprint: Local Sigma Validation Architecture
A reference diagram detailing the expression tree compilation flow, field alias resolution pipeline, and multi-backend emission strategy. Includes memory footprint analysis and event throughput benchmarks for local scanning. - Checklist: Rule Development & Testing Workflow
Step-by-step validation sequence: environment setup β alias mapping verification β expression tree compilation β sample log execution β false-positive triage β CI integration. Covers pre-deployment rule hygiene and cross-platform compatibility checks. - Configuration Templates
field_aliases.yml: Pre-populated mapping for Sysmon, Wazuh, Splunk, and Elastic field variants.- CLI Scan Invocation:
python -m siemforge --scan samples/events.jsonlwith output formatting flags. - CI Pipeline Snippet: GitHub Actions workflow for automated rule parsing, sample log execution, and converter regression testing.
