Per-Pod NetworkPolicy in Practice: Migrating Five Agents in a Day
Architecting Zero-Trust Egress: Decoupling Network Scanners from Agent Workloads in Kubernetes
Current Situation Analysis
Kubernetes network policies are enforced at the pod boundary, not the container boundary. This architectural constraint creates a persistent blind spot for teams deploying in-pod security sidecars. When a workload requires strict egress isolation but relies on a co-located scanner to proxy or inspect outbound traffic, the Container Network Interface (CNI) plugin sees a single network namespace. It cannot distinguish between the application container and the security container. The result is a binary policy choice: either grant the entire pod unrestricted internet access, or block all egress and break the scanner.
This contradiction is frequently papered over with application-layer configurations. Engineers set HTTPS_PROXY environment variables on the main container and leave the pod’s NetworkPolicy wide open. While this forces the application through the scanner, it provides only advisory enforcement. A subprocess that clears the proxy variable, a misconfigured library that ignores environment hints, or a direct socket connection will bypass the scanner entirely. The kernel routing table remains unchanged, and the CNI permits the traffic because the pod-level policy allows it.
The misunderstanding stems from conflating application-level routing with infrastructure-level enforcement. Network policies are implemented by iptables or eBPF rules attached to the pod’s virtual Ethernet interface. These rules operate on IP addresses and port ranges, completely unaware of process-level environment variables or proxy configurations. When security requirements demand that a workload never touches the public internet directly, relying on in-pod sidecars with shared namespaces introduces an unenforceable trust boundary. The only reliable path forward is structural separation: isolating the scanning proxy into its own pod, granting it explicit internet access, and restricting the workload pod to communicate exclusively with the scanner via cluster-internal routing.
WOW Moment: Key Findings
Migrating from co-located sidecars to a decoupled companion architecture fundamentally changes how egress security is enforced. The shift moves policy enforcement from the application layer to the CNI layer, eliminating kernel-level bypass vectors while maintaining full traffic visibility.
| Architecture Pattern | Policy Enforcement Boundary | Bypass Surface Area | TLS Interception Coverage | Operational Overhead |
|---|---|---|---|---|
| In-Pod Sidecar | Application/Proxy | High (env clear, direct sockets) | Partial (depends on app config) | Low (single pod) |
| Decoupled Companion | CNI/NetworkPolicy | Near-Zero (kernel enforces route) | Full (all traffic forced through proxy) | Moderate (extra pod + service) |
The decoupled model transforms egress security from a configuration-dependent state to a network-enforced state. By removing direct internet routes from the workload pod, even privileged processes or misconfigured libraries cannot reach external endpoints without traversing the scanner. The additional resource cost is typically negligible (~50-100 MiB memory per companion), while the reduction in attack surface is measurable and auditable. This pattern enables true zero-trust egress without sacrificing observability or breaking existing security tooling.
Core Solution
The decoupled companion architecture relies on three coordinated components: a restricted workload pod, a dedicated scanning pod with internet access, and a Kubernetes Service that mediates traffic between them. NetworkPolicy rules enforce the routing constraints at the CNI level.
Step 1: Deploy the Scanning Companion
Create a standalone Deployment for the network scanner. This pod requires internet access for outbound inspection, certificate validation, and remote policy updates. It runs with minimal privileges and exposes a proxy port for internal consumption.
apiVersion: apps/v1
kind: Deployment
metadata:
name: egress-guardian
namespace: production-agents
spec:
replicas: 1
selector:
matchLabels:
app: egress-guardian
template:
metadata:
labels:
app: egress-guardian
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
containers:
- name: scanner
image: registry.internal/egress-scanner:v2.4.1
ports:
- containerPort: 8080
name: proxy
env:
- name: LISTEN_ADDR
value: "0.0.0.0:8080"
- name: TLS_CA_PATH
value: "/etc/ssl/certs/intermediate.pem"
volumeMounts:
- name: ca-bundle
mountPath: /etc/ssl/certs/intermediate.pem
subPath: ca.pem
readOnly: true
volumes:
- name: ca-bundle
secret:
secretName: guardian-tls-ca
defaultMode: 0o640
Step 2: Expose via Cluster Service
Create a ClusterIP Service that routes traffic to the companion pod. This abstraction prevents direct IP dependencies and enables seamless scaling or rolling updates without modifying workload configurations.
apiVersion: v1
kind: Service
metadata:
name: egress-guardian-svc
namespace: production-agents
spec:
selector:
app: egress-guardian
ports:
- port: 8080
targetPort: proxy
protocol: TCP
Step 3: Restrict Workload Egress
Apply a NetworkPolicy to the agent namespace that denies all outbound traffic by default, then explicitly permit connections only to the companion service. This rule is enforced by the CNI plugin before any packet leaves the pod’s network namespace.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: restrict-agent-egress
namespace: production-agents
spec:
podSelector:
matchLabels:
role: agent-workload
policyTypes:
- Egress
egress:
- to:
- podSelector:
matchLabels:
app: egress-guardian
ports:
- port: 8080
protocol: TCP
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
Step 4: Configure Workload Routing
Update the agent Deployment to remove the in-pod scanner, inject the proxy environment variable pointing to the service DNS name, and apply the restrictive label.
apiVersion: apps/v1
kind: Deployment
metadata:
name: data-processor
namespace: production-agents
spec:
template:
metadata:
labels:
role: agent-workload
spec:
containers:
- name: app
image: registry.internal/data-processor:latest
env:
- name: HTTPS_PROXY
value: "http://egress-guardian-svc:8080"
- name: NO_PROXY
value: "localhost,127.0.0.1,.cluster.local"
Architecture Rationale
- Service Abstraction: Direct pod IP routing breaks during rollouts or pod rescheduling. A Service provides stable DNS resolution and load balancing across companion replicas.
- DNS Exception: Kubernetes pods require DNS resolution to locate services. The NetworkPolicy explicitly permits UDP/TCP 53 to the
kube-dnspods, preventing silent resolution failures. - CNI Enforcement: By removing default internet egress, the kernel routing table drops packets destined for external IPs before they reach the network interface. Application-level proxy bypass attempts fail at the socket layer.
- TLS Interception: The companion pod handles certificate generation and validation. Workloads trust the cluster-internal CA, ensuring all outbound HTTPS traffic is decrypted, inspected, and re-encrypted without application modifications.
Pitfall Guide
1. SubPath ConfigMap Mount Staleness
Explanation: Kubernetes propagates ConfigMap updates to mounted files only when using directory mounts. subPath: mounts create a static symlink to the file inode at pod creation. Subsequent ConfigMap updates never refresh the mounted file, causing scanners to run with stale allowlists or policies.
Fix: Always mount ConfigMaps as directories (mountPath: /etc/config). If subPath: is unavoidable, implement a sidecar watcher that triggers a pod restart or use a volume driver that supports hot-reload. Validate mount behavior in staging before production rollout.
2. Init Container Internet Dependency
Explanation: Many deployments use init containers to fetch binaries or configuration from external registries during startup. After egress lockdown, these containers fail immediately because the pod’s NetworkPolicy blocks outbound connections before the main container starts.
Fix: Bake required binaries into the companion image. Use an init container that copies the executable from the companion image to a shared emptyDir volume. This eliminates network calls during initialization and guarantees version parity between the scanner and its bootstrap process.
3. TLS Interception Incompatibility
Explanation: Certain workloads (browser automation, custom TLS clients, certificate pinning) cannot operate behind a MITM proxy. They expect end-to-end certificate validation and will reject the scanner’s generated certificates. Fix: Isolate incompatible workloads into separate pods with direct egress policies. Route traffic through a dedicated scraping or automation deployment that bypasses the scanner. Maintain visibility by logging outbound requests at the application layer before forwarding them to the isolated pod. Accept the trade-off: full inspection coverage is impossible for protocols that enforce strict certificate chains.
4. Multi-Tenant Identity Collision
Explanation: When multiple agents share a namespace but require distinct outbound identities or authentication tokens, a single companion pod cannot manage multiple credential contexts simultaneously. Most scanners bind to a single default_agent_identity configuration field.
Fix: Deploy one companion per identity. The resource overhead is minimal (~50 MiB memory per instance), but it guarantees credential isolation and simplifies policy auditing. For large-scale deployments, evaluate CNI-native egress gateways or service mesh sidecars that support per-request identity injection.
5. Secret Volume Permission Masking
Explanation: Scanners often require TLS CA certificates mounted from Kubernetes Secrets. Setting defaultMode: 0o444 (world-readable) triggers security validators that reject insecure file permissions. Conversely, 0o600 without matching ownership leaves the file unreadable to non-root containers.
Fix: Use defaultMode: 0o640 combined with securityContext.fsGroup: 2000. Kubernetes automatically chowns the mounted files to the specified group and adds the group to the container’s supplementary groups. This allows arbitrary non-root UIDs to read the secret without granting world access. Avoid 0o400 with fsGroup, as the missing group-read bit prevents access regardless of group membership.
6. Unnecessary Egress Routing (VPN/Proxy Flakes)
Explanation: Legacy deployments often attach VPN or external proxy sidecars to enforce exit-IP rotation. When migrated to the companion model, these additional routing layers introduce instability, DNS resolution delays, and tunnel cycling during pod startup. Fix: Remove VPN or external proxy sidecars unless exit-IP rotation is a strict compliance requirement. The companion pod’s primary function is content inspection, not network anonymization. Rely on the cluster’s native egress routing. If IP rotation is mandatory, implement it at the egress gateway level rather than embedding it in the scanning pod.
Production Bundle
Action Checklist
- Audit existing NetworkPolicies for wide-open egress rules that bypass CNI enforcement
- Identify workloads using
HTTPS_PROXYor application-level routing for security compliance - Verify CNI plugin supports NetworkPolicy enforcement (Calico, Cilium, or native kube-proxy)
- Create companion Deployment with internet egress and TLS interception capabilities
- Apply restrictive egress NetworkPolicy to workload pods, permitting only companion service and DNS
- Validate bypass resistance using raw TCP dial tests and environment variable clearing probes
- Implement egress metrics collection on companion pods for audit and anomaly detection
- Document rollback procedures and maintain parallel manifest sets during transition
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Single workload, strict egress isolation | Decoupled companion pod | CNI enforces routing, eliminates bypass vectors | +1 pod (~50-100 MiB RAM) |
| Multiple workloads sharing namespace | One companion per identity | Prevents credential collision, simplifies policy mapping | Linear scaling with identity count |
| Browser automation / TLS pinning | Isolated egress pod + app-level logging | MITM breaks certificate validation; logging preserves visibility | +1 pod, reduced inspection coverage |
| High-throughput data pipelines | Egress gateway with connection pooling | Companion proxy adds latency; gateway handles bulk traffic efficiently | Higher infrastructure cost, better throughput |
| Compliance requiring exit-IP rotation | Dedicated NAT gateway + companion scanner | Separates anonymization from inspection, reduces tunnel instability | +NAT gateway cost, improved stability |
Configuration Template
# companion-egress-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-companion-internet
namespace: production-agents
spec:
podSelector:
matchLabels:
app: egress-guardian
policyTypes:
- Egress
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
ports:
- port: 80
protocol: TCP
- port: 443
protocol: TCP
- port: 53
protocol: UDP
- port: 53
protocol: TCP
---
# workload-restrict-egress.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: restrict-workload-egress
namespace: production-agents
spec:
podSelector:
matchLabels:
role: agent-workload
policyTypes:
- Egress
egress:
- to:
- podSelector:
matchLabels:
app: egress-guardian
ports:
- port: 8080
protocol: TCP
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
Quick Start Guide
- Label your workload pods with a consistent selector (e.g.,
role: agent-workload) to target NetworkPolicy enforcement. - Deploy the companion scanner using the provided template, ensuring the TLS CA secret uses
0o640mode andfsGroupconfiguration. - Apply the restrictive egress policy to the workload namespace. Verify DNS resolution remains functional by testing
nslookupinside a running pod. - Update workload environment variables to point
HTTPS_PROXYto the companion Service DNS name. Remove any in-pod scanner containers from the Deployment spec. - Validate enforcement by execing into a workload pod and attempting a direct
curlto an external endpoint. The connection should timeout or be rejected by the CNI, confirming zero-trust egress is active.
Mid-Year Sale — Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register — Start Free Trial7-day free trial · Cancel anytime · 30-day money-back
