Networking
Kubernetes networking with Cilium: eBPF, kube-proxy replacement, and Hubble
Cilium changes Kubernetes networking by replacing more of kube-proxy with eBPF, enforcing identity-aware policy, and adding flow visibility through Hubble.
If you are revisiting Kubernetes networking because kube-proxy and iptables have become harder to reason about than the workloads they support, Cilium is usually where the conversation ends up. It uses eBPF to move more of service routing, policy enforcement, and flow visibility into the kernel, which gives the cluster a dataplane that is easier to inspect under pressure.
That matters less on a toy cluster than on one that has grown awkwardly. At that point the question is rarely “can Kubernetes route traffic?“. The question is whether you can still explain what happened when a packet was dropped, a Service changed shape, or a policy quietly blocked a dependency during a rollout.
What Cilium is replacing in Kubernetes networking
On a conventional cluster, kube-proxy watches Services and endpoints, then translates that state into node-level rules. Kubernetes networking still works, but the hot path can become a long pile of iptables state that is difficult to audit once Services are busy and churn is frequent.
Cilium can run in kube-proxy replacement mode, which moves service handling into eBPF programs attached to the kernel. In practical terms, that means service translation happens closer to the packet instead of depending on long iptables chains built from control-plane updates.
That shift is what makes Cilium interesting for production platforms. The appeal is not only raw performance. It is that service routing, policy, and observability start to live in the same dataplane instead of in three loosely related systems that all need separate debugging habits.
There is one migration detail worth stating plainly because the official docs do: removing kube-proxy on an existing cluster breaks existing Service connections until the Cilium replacement is installed. This is not a toggle you flip casually in the middle of the working day.
Why eBPF helps here
The useful definition of eBPF is not “programmable kernel space”. The useful definition is that it lets a project like Cilium attach small, verified programs to kernel events and packet paths without asking you to fork the kernel or ship kernel modules for every change.
That gives Cilium enough leverage to keep packet handling, service load-balancing, and observability close to the kernel while still evolving at the speed of a user-space project. For Kubernetes operators, the result is a networking stack that is more programmable than iptables and more transparent than a black-box appliance model.
The value shows up in three places:
- service routing can happen in the eBPF dataplane rather than via long iptables chains
- policy can follow workload identity instead of ephemeral pod IPs
- flow visibility can be captured where the packet actually moved, rather than reconstructed afterwards from logs
That combination is why Cilium tends to turn up in serious Kubernetes networking conversations. It is not just faster packet handling. It is a better operational model.
Cilium policy is easier to review when it follows identity
Cilium’s policy model is strongest when it sticks to the same identities the platform already uses: labels, endpoints, namespaces, and ports.
A minimal CiliumNetworkPolicy looks like this:
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: api-allow-web
namespace: payments
spec:
endpointSelector:
matchLabels:
app: api
ingress:
- fromEndpoints:
- matchLabels:
app: web
toPorts:
- ports:
- port: '8080'
protocol: TCP
This says pods labelled app: api in the payments namespace may accept TCP traffic on port 8080 from pods labelled app: web.
The part worth understanding is the default enforcement behaviour. Cilium allows ingress and egress traffic until a policy selects an endpoint. Once an ingress or egress policy selects that endpoint, the selected direction moves into default-deny and only explicitly allowed traffic is permitted. That makes policy review easier because you can reason from the endpoint outward instead of reverse-engineering a node’s current rule set.
It also makes one class of mistake more obvious. If a Service starts timing out after “just one policy change”, the real change may be that the first matching policy switched the endpoint into default-deny for that direction.
kube-proxy replacement is useful, but migration still needs care
Cilium’s kube-proxy replacement supports the normal Kubernetes Service types people care about: ClusterIP, NodePort, LoadBalancer, externalIPs, and hostPort. The usual deployment pattern is to install Cilium with kubeProxyReplacement=true, then verify the mode directly from the agent.
A basic check looks like this:
kubectl -n kube-system exec ds/cilium -- cilium-dbg status --verbose
That gives you the useful answers during rollout: whether kube-proxy replacement is actually enabled, which devices are handling traffic, and which Service features are active.
The trap is treating the migration like a harmless implementation detail. It is not. The Cilium docs also warn that running the eBPF replacement alongside kube-proxy on an already serving cluster can still break existing connections because the two NAT mechanisms are not aware of each other. If you are swapping dataplanes under live traffic, test the failure mode first.
Hubble is the operational reason many teams stay with Cilium
The feature people miss most after moving away from basic Kubernetes networking is usually not a packet-per-second number. It is Hubble.
Hubble gives Cilium flow visibility at the dataplane level. By default, the Hubble API is node-scoped through the local Cilium agent. When you add Hubble Relay, that visibility becomes cluster-wide, and the same data can feed the CLI or UI.
That changes incident response in a practical way. Instead of asking three different teams whether a timeout came from policy, service translation, or the application itself, you can inspect the flow and see whether traffic was forwarded, dropped, or denied.
A small first-pass check is usually enough to confirm whether the dataplane is healthy:
kubectl -n kube-system exec ds/cilium -- cilium-dbg status --verbose
hubble status
Applications are often poor witnesses during a network incident. They can tell you a request failed. They cannot reliably tell you whether policy denied the packet, whether the packet never found a backend, or whether the dependency responded too slowly. Hubble shortens that loop.
Where Cilium pays off, and where it may be more than you need
Cilium is a good fit when the cluster has grown past the point where kube-proxy plus ad hoc debugging feels acceptable.
It is especially useful when:
- Service churn has made iptables state noisy and hard to explain
- you want policy tied to workload identity rather than pod IPs
- network incidents routinely turn into guesswork
- you want one dataplane for routing, policy, and flow visibility
It is less compelling when the cluster is small, the networking needs are simple, and the team does not want the extra operational surface area. eBPF is not magic. Kernel compatibility, feature rollout, and dataplane migration still need care.
That is the honest trade-off. Cilium gives you more control and much better visibility, but it also expects the platform team to understand the dataplane it is adopting.
The payoff usually shows up as a concrete before-and-after measured against your own baseline — p99 service-to-service latency, kube-proxy CPU reclaimed, or a measurable reduction in packet-path churn — so capture those in your environment before and after the switch.
Takeaway
Cilium changes Kubernetes networking by pulling service routing, identity-aware policy, and observability into an eBPF dataplane that is closer to the kernel and easier to inspect under load.
The win is not only that packets move through fewer layers. It is that Kubernetes networking becomes easier to reason about when something breaks. If your team is already spending too much time spelunking through kube-proxy state, Cilium is worth a serious look.