s per node or 500 concurrent services experience iptables chain thrashing, manifesting as sporadic timeouts and kube-proxy restarts. eBPF dataplanes eliminate this ceiling by moving enforcement to the TC (Traffic Control) and XDP (eXpress Data Path) hooks, where packets are processed before entering the networking stack. This architectural shift transforms networking from a scaling bottleneck into a deterministic, observable subsystem.
Core Solution
Implementing a production-grade Kubernetes network requires aligning CNI selection, dataplane architecture, policy modeling, and observability into a cohesive stack. The following implementation uses Cilium as the reference architecture due to its eBPF-native dataplane, integrated service mesh capabilities, and mature IPAM operator.
Step 1: Dataplane Architecture Selection
eBPF must replace iptables at the kernel level. This requires:
- Kernel version β₯ 5.10 (or backported eBPF features)
- BTF (BPF Type Format) enabled for kernel compatibility
CONFIG_BPF_SYSCALL=y and CONFIG_CGROUP_BPF=y compiled into the host kernel
Cilium installs three components:
cilium-agent: Runs on each node, compiles eBPF programs, manages IPAM, enforces policies
cilium-operator: Handles IP address allocation, identity management, and cloud provider integration
hubble-relay/hubble-ui: Optional observability layer for connection telemetry
Step 2: Installation with Helm
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=strict \
--set bpf.masquerade=true \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set operator.replicas=2
Key flags:
kubeProxyReplacement=strict: Disables kube-proxy entirely, routing all service traffic through eBPF
bpf.masquerade=true: Handles SNAT for external traffic using eBPF maps instead of iptables MASQUERADE
ipam.mode=kubernetes: Delegates IP allocation to Cilium's IPAM controller, preventing IP exhaustion
Step 3: Network Policy Implementation
Cilium enforces policies at L3/L4 and L7. Unlike standard Kubernetes NetworkPolicy, Cilium extends to DNS, HTTP paths, and TLS.
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: frontend-to-backend
spec:
endpointSelector:
matchLabels:
app: backend
role: api
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: GET
path: "/api/v1/data"
egress:
- toEndpoints:
- matchLabels:
app: database
toPorts:
- ports:
- port: "5432"
protocol: TCP
This policy replaces 14 iptables rules with a single declarative object. The eBPF verifier compiles it into a bounded program that executes in <30ΞΌs per packet.
Step 4: Observability Integration
Hubble captures flow logs, policy decisions, and DNS queries without sidecar proxies.
cilium hubble enable
cilium hubble port-forward &
hubble observe --namespace production --protocol tcp
Output provides real-time connection state, policy evaluation results, and drop reasons. This eliminates packet capture guessing and reduces debugging cycles from hours to minutes.
Architecture Decisions & Rationale
- Why eBPF over iptables? Deterministic performance, O(1) policy lookup, native observability, and kernel-space execution avoid context switches.
- Why strict kube-proxy replacement? Removes duplicate load balancing logic, reduces node resource consumption, and unifies service routing in the dataplane.
- Why Cilium over Calico? Calico's eBPF mode is mature but relies on BPF+IPVS hybrid for service routing. Cilium's native eBPF service routing eliminates IPVS dependency and provides deeper L7 inspection.
- Trade-offs: Requires kernel compatibility validation, steeper learning curve for policy authoring, and initial cluster migration planning. The operational payoff justifies the investment for clusters >500 pods.
Pitfall Guide
-
Ignoring MTU Fragmentation
VXLAN/Geneve overlays add 50β100 bytes of encapsulation overhead. Default host MTU (1500) causes IP fragmentation, degrading throughput by 15β30%. Fix: Set host MTU to 9000 (jumbo frames) or configure CNI to use mtu: 1450 with automatic path MTU discovery.
-
Assuming Default CNI Handles Production Scale
Cloud provider default CNIs prioritize ease of setup over performance. They lack eBPF dataplanes, advanced policy evaluation, and integrated observability. Production clusters require explicit CNI replacement or upgrade paths.
-
Overlapping NetworkPolicies Causing Implicit Denies
Kubernetes evaluates policies as additive allows but implicit denies. Multiple overlapping policies create unpredictable evaluation order. Fix: Use namespace isolation as the primary boundary, apply least-privilege policies per workload, and validate with cilium policy validate.
-
DNS Resolution Failures from CoreDNS Misconfiguration
Pods failing to resolve services often trace to CoreDNS running in a restricted network namespace or missing forward plugins. Ensure kube-dns service IP matches clusterIP, and verify resolv.conf inside pods points to the correct nameserver.
-
Treating CNI and Service Mesh as Interchangeable
CNI handles L3/L4 routing and network policy. Service mesh handles L7 traffic management, mTLS, and observability. Deploying both without clear boundary definitions causes double encryption, policy conflicts, and debugging fragmentation. Use CNI for baseline isolation, mesh for application-layer routing.
-
IPAM Exhaustion from Pod Churn
Rapid scaling or failed pod evictions leak IP addresses. Default kubelet behavior doesn't always trigger IP release. Fix: Enable Cilium's IPAM garbage collection (ipam.operator.clusterPoolIPv4PodCIDRList), monitor cilium_ipam_available metrics, and configure --ipam-release-delay appropriately.
-
Skipping eBPF Verifier Debugging
Complex policies or custom eBPF programs may fail verification. The verifier enforces bounded loops, safe memory access, and instruction limits. Fix: Use cilium-dbg bpf policy get to inspect compiled programs, simplify policy logic, and avoid deep conditional nesting.
Best Practices from Production:
- Enforce egress policies by default; allow-list only required external endpoints
- Use
cilium status and cilium service list for daily health validation
- Pin CNI versions to stable releases; avoid rolling updates during peak traffic
- Validate MTU end-to-end before enabling overlay networking
- Audit policy changes with
cilium policy trace before applying to production
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Small cluster (<100 pods), budget-constrained | Default CNI + NetworkPolicy | Lower operational overhead, sufficient for low scale | Minimal compute cost, higher MTTR risk |
| Medium cluster (100β1000 pods), compliance-driven | Calico + eBPF mode | Mature policy engine, audit logging, regulatory alignment | Moderate licensing cost, predictable performance |
| Large cluster (>1000 pods), latency-sensitive | Cilium + strict kube-proxy replacement | O(1) policy evaluation, integrated L7 inspection, lower CPU overhead | Higher initial engineering investment, reduced infra cost long-term |
| Multi-cloud/hybrid, strict egress control | Cilium + ExternalIPs + DNS policy | Unified policy across clouds, granular egress filtering, no cloud-native dependencies | Moderate operational complexity, eliminates data exfiltration risk |
Configuration Template
# cilium-values.yaml
ipam:
mode: kubernetes
kubeProxyReplacement: strict
bpf:
masquerade: true
datapathMode: netkit
hubble:
relay:
enabled: true
ui:
enabled: true
operator:
replicas: 2
rollOutPods: true
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 200m
memory: 256Mi
# baseline-networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all-ingress
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
---
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: allow-dns-and-core-services
namespace: production
spec:
endpointSelector: {}
egress:
- toEntities:
- kube-apiserver
- cluster
- toEndpoints:
- matchLabels:
k8s-app: kube-dns
toPorts:
- ports:
- port: "53"
protocol: UDP
- port: "53"
protocol: TCP
Quick Start Guide
- Validate kernel compatibility: Run
uname -r (β₯5.10) and cat /sys/kernel/btf/vmlinux to confirm BTF support.
- Install Cilium: Execute the Helm install command with
kubeProxyReplacement=strict and verify pods reach Running state.
- Apply baseline policy: Deploy the deny-all ingress policy and DNS/egress allow policy to establish secure defaults.
- Verify dataplane: Run
cilium status to confirm eBPF mode, cilium service list to validate service routing, and hubble observe to confirm telemetry flow.
- Test connectivity: Deploy a test pod, attempt internal service resolution, and validate policy enforcement with
cilium policy trace.