pturing syscalls, network packets, and cgroup metrics without modifying application code or injecting proxies. The result is lower overhead, higher signal fidelity, and native Kubernetes context propagation. Teams that migrate to this architecture report a 40% reduction in alert volume and a 35% decrease in MTTR within the first quarter, as telemetry aligns with container lifecycle events rather than host uptime.
Core Solution
Implementing a production-grade container monitoring stack requires a layered approach: infrastructure telemetry, application instrumentation, and correlation pipeline. The architecture below uses OpenTelemetry for signal unification, Prometheus for metric storage, eBPF for low-level visibility, and Grafana for visualization.
Step 1: Define Telemetry Boundaries and Data Flow
Container monitoring must separate concerns:
- Metrics: Aggregated, time-series data (CPU, memory, request rates, error ratios)
- Traces: Distributed request paths with span context
- Logs: Structured, timestamped events with correlation IDs
- eBPF Data: Kernel-level syscalls, network flows, cgroup events
Data flows should prioritize push for traces/logs (OTel Collector) and pull for metrics (Prometheus), with eBPF exporters bridging kernel events into the OTel pipeline.
Step 2: Deploy OpenTelemetry Collector as DaemonSet
The Collector acts as the central telemetry router. Deploy it as a DaemonSet to ensure one instance per node, reducing network hops and enabling node-level aggregation.
# otel-collector-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: otel-collector
namespace: monitoring
spec:
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: collector
image: otel/opentelemetry-collector-contrib:0.95.0
args: ["--config=/conf/collector.yaml"]
volumeMounts:
- name: config
mountPath: /conf
resources:
limits:
cpu: "500m"
memory: "256Mi"
volumes:
- name: config
configMap:
name: otel-collector-config
Step 3: Instrument Applications (TypeScript)
Application-level instrumentation must inject trace context, emit business metrics, and propagate correlation IDs to logs. Use the OpenTelemetry SDK for Node.js/TypeScript.
// instrumentation.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { PrometheusExporter } from '@opentelemetry/exporter-prometheus';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-proto';
import { Resource } from '@opentelemetry/resources';
import { SEMRESATTRS_SERVICE_NAME, SEMRESATTRS_CONTAINER_ID } from '@opentelemetry/semantic-conventions';
const sdk = new NodeSDK({
resource: new Resource({
[SEMRESATTRS_SERVICE_NAME]: 'api-gateway',
[SEMRESATTRS_CONTAINER_ID]: process.env.HOSTNAME || 'unknown',
}),
traceExporter: new OTLPTraceExporter({ url: 'http://otel-collector.monitoring:4318/v1/traces' }),
metricReader: new PrometheusExporter({ port: 9464, endpoint: '/metrics' }),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
process.on('SIGTERM', () => sdk.shutdown().catch(console.error));
Prometheus scrapes the OTel Collector and application endpoints. Use relabeling to inject Kubernetes metadata and drop high-cardinality labels.
# prometheus-scrape-config.yaml
scrape_configs:
- job_name: 'otel-collector'
static_configs:
- targets: ['otel-collector.monitoring:9464']
metric_relabel_configs:
- source_labels: [__name__]
regex: 'go_.*'
action: drop
- job_name: 'app-metrics'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
Step 5: Integrate eBPF for Kernel-Level Visibility
Deploy an eBPF exporter (e.g., Pixie, Cilium, or custom BCC-based exporter) to capture network flows, syscall latencies, and OOM events. Route eBPF metrics through the OTel Collector to maintain a unified pipeline.
# ebpf-exporter-config.yaml
metrics:
- name: container_network_retransmits
type: counter
help: "TCP retransmissions per container"
matchers:
- container_id
value:
metric: net_retransmit
labels:
- container_id
Architecture Decisions and Rationale
- DaemonSet over Sidecar: Reduces resource duplication. Sidecars are reserved for application-specific instrumentation only when business logic requires custom spans.
- Pull + Push Hybrid: Prometheus pull ensures metric consistency and avoids push queue backpressure. OTel push handles traces/logs where request context is critical.
- Label Strategy: Enforce low cardinality at ingestion. Drop dynamic labels like
pod_ip or container_image_id unless required for debugging. Use namespace, deployment, and service for aggregation.
- Sampling Policy: Implement head-based sampling for traces (10β20% in production) with tail-based filtering for errors and high latency to control storage costs.
Pitfall Guide
-
High Cardinality Labels: Adding pod IPs, container hashes, or user IDs to metrics causes TSDB memory exhaustion and query timeouts. Prometheus can handle ~10M series per instance; exceeding this triggers compaction failures and alert degradation. Best practice: enforce label allowlists at the OTel Collector level using metric_relabel_configs.
-
Ignoring Container Lifecycle Events: Monitoring only running containers misses CrashLoopBackOff, OOMKilled, and Evicted states. These events correlate directly with application failures. Best practice: scrape Kubernetes API metrics (kube_pod_status_phase, kube_pod_container_status_last_terminated_reason) and correlate with eBPF OOM traces.
-
Mixing Telemetry Signals Without Correlation IDs: Logs, traces, and metrics collected in isolation cannot be joined during incident response. Best practice: propagate trace_id and span_id through HTTP headers, environment variables, and log fields. Use OpenTelemetry context propagation standards.
-
Over-Instrumenting with Verbose Traces: Sampling 100% of requests in high-throughput services saturates storage and degrades p99 latency. Best practice: implement probabilistic head sampling with tail-based rules for errors, 5xx responses, and latency spikes. Retain raw traces for 24 hours, aggregated for 30 days.
-
Kernel Version Mismatch for eBPF: eBPF requires kernel 5.4+ with BTF enabled. Running eBPF exporters on older kernels causes silent failures or performance degradation. Best practice: validate kernel compatibility during cluster provisioning. Use fallback cgroup-based collection for legacy nodes.
-
Static Thresholds in Dynamic Environments: Fixed CPU/memory alerts fail in auto-scaling clusters where baseline utilization fluctuates. Best practice: implement dynamic thresholding using rolling averages, percentile-based alerting (p95/p99), and anomaly detection via Prometheus recording rules or external ML pipelines.
-
Neglecting Data Retention and Compression: Unoptimized telemetry pipelines store raw data indefinitely, inflating storage costs and query latency. Best practice: apply downsampling rules (e.g., 15s β 1m β 1h), enable Prometheus TSDB compaction, and route cold data to object storage with Parquet format for cost-effective archival.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Small K8s cluster (<20 nodes) | OTel Collector + Prometheus | Lightweight, declarative, minimal operational overhead | Low ($50β150/mo storage) |
| Multi-cloud / hybrid infrastructure | eBPF + OTel Gateway | Cross-platform kernel visibility, unified pipeline, cloud-agnostic | Medium ($200β500/mo egress + storage) |
| High-throughput microservices | Tail-based sampling + pull metrics | Reduces trace volume by 80%, maintains error visibility | Low-Medium (saves 60% trace storage) |
| Compliance-heavy / regulated workloads | Sidecar-only + audit log pipeline | Isolates telemetry per pod, enables data residency controls | High ($400β900/mo sidecar overhead + audit storage) |
Configuration Template
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
prometheus:
config:
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
processors:
batch:
timeout: 10s
send_batch_max_size: 2000
filter/otel:
metrics:
include:
match_type: strict
metric_names:
- http.server.duration
- process.cpu.time
- container.memory.usage
exporters:
prometheus:
endpoint: 0.0.0.0:9464
namespace: otel
otlphttp:
endpoint: http://grafana-cloud:443
headers:
Authorization: Bearer ${GRAFANA_CLOUD_TOKEN}
service:
pipelines:
metrics:
receivers: [otlp, prometheus]
processors: [batch, filter/otel]
exporters: [prometheus, otlphttp]
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlphttp]
Quick Start Guide
- Install the OpenTelemetry Collector DaemonSet using the provided YAML. Verify pods are running on each node with
kubectl get ds -n monitoring.
- Annotate your application pods with
prometheus.io/scrape: "true" and prometheus.io/path: "/metrics". Ensure the OTel SDK is initialized in your TypeScript entrypoint.
- Deploy Prometheus with the scrape configuration. Confirm targets are discovered and metrics are accessible at
http://prometheus:9090/targets.
- Configure Grafana to connect to Prometheus and OTLP endpoints. Import the Kubernetes container monitoring dashboard and verify pod-level CPU, memory, and request metrics are populating.