What is a Service Mesh? (Beyond Istio)
Once you have dozens of microservices talking to each other on Kubernetes, a familiar set of problems emerges: How do you encrypt traffic between services? How do you retry failed requests without writing retry logic in every service? How do you trace a request as it fans out across ten services? You could solve each of these in application code, or you could handle all of them at the network layer with a service mesh.
What a Service Mesh Actually Does
A service mesh intercepts all network traffic between services by injecting a sidecar proxy into each Pod. The proxy handles:
- mTLS — mutual TLS between every pair of services, encrypting traffic and verifying identity without code changes
- Retries and timeouts — automatic retry on transient failures, configurable per-route
- Load balancing — advanced algorithms (least request, consistent hash) beyond what Kubernetes Services offer
- Circuit breaking — stop sending traffic to a failing service to prevent cascade failures
- Traffic splitting — send 10% of traffic to a new version for canary deployments
- Distributed tracing — inject trace headers and report spans to Jaeger or Zipkin
- Observability — automatic golden signal metrics (requests, errors, latency) for every service pair
The key insight: your application code doesn’t change. The proxy handles all of this transparently.
The Sidecar Architecture
Pod A Pod B
┌────────────────────┐ ┌────────────────────┐
│ app container │ │ app container │
│ (port 8080) │ │ (port 8080) │
│ ↓ │ │ ↑ │
│ sidecar proxy │─ mTLS ──│ sidecar proxy │
│ (Envoy/linkerd2) │ │ (Envoy/linkerd2) │
└────────────────────┘ └────────────────────┘
↕ ↕
Control Plane (issues certs, pushes routing config, collects telemetry)
The control plane distributes certificates, pushes routing rules to every proxy, and aggregates metrics — all without touching your application.
Istio
Istio is the most feature-rich service mesh, built on the Envoy proxy. It’s the choice when you need granular traffic control.
Install with Helm:
$ helm repo add istio https://istio-release.storage.googleapis.com/charts
$ helm install istio-base istio/base -n istio-system --create-namespace
$ helm install istiod istio/istiod -n istio-system
Enable sidecar injection for a namespace:
$ kubectl label namespace production istio-injection=enabled
Every new Pod in production now gets an Envoy sidecar injected automatically.
Traffic splitting for a canary deploy:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp
http:
- route:
- destination:
host: myapp
subset: v1
weight: 90
- destination:
host: myapp
subset: v2
weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: myapp
spec:
host: myapp
subsets:
- name: v1
labels:
version: "1.4.2"
- name: v2
labels:
version: "1.5.0"
Istio’s power comes with real complexity — the CRD count alone is over 40.
Linkerd
Linkerd takes the opposite philosophy: minimal surface area, production-safe defaults, zero configuration for 80% of use cases. It uses its own ultra-lightweight Rust proxy (linkerd2-proxy) instead of Envoy, which means significantly lower CPU and memory overhead.
$ brew install linkerd
$ linkerd install --crds | kubectl apply -f -
$ linkerd install | kubectl apply -f -
$ linkerd check
# inject into a namespace
$ kubectl annotate namespace production \
linkerd.io/inject=enabled
Linkerd gives you mTLS, retries, timeouts, and golden-signal metrics with almost no configuration. If you don’t need Istio’s advanced traffic management, Linkerd’s operational simplicity is a strong argument.
Cilium (eBPF-based)
Cilium takes a fundamentally different approach: instead of sidecars, it uses eBPF — a Linux kernel technology — to intercept and control traffic directly in the kernel. No sidecar, no proxy overhead, no extra container per Pod.
This makes Cilium extremely performant and the right choice if you’re already using it as your CNI (cluster network interface). Cilium Mesh handles L7 policy, mTLS via SPIFFE, and observability through Hubble.
Do You Actually Need a Service Mesh?
A service mesh adds operational overhead — more components to upgrade, more things to debug when networking breaks. Before adopting one, check whether you actually need what it provides:
| Problem | Simpler alternative | |
|---|---|---|
| Encryption between services | Istio mTLS | Network policies + cert-manager |
| Retries | Service mesh | Client library (tenacity, go-retry) |
| Canary deployments | Istio VirtualService | Argo Rollouts, Flagger |
| Distributed tracing | Any mesh | OpenTelemetry SDK in your app |
| Circuit breaking | Any mesh | Resilience4j, go-resiliency |
The honest answer: if you have fewer than ~10 services and a small team, a service mesh is probably more complexity than it’s worth. If you have 50+ services, heterogeneous languages, and a security requirement for zero-trust networking, it earns its keep.
Conclusion
A service mesh moves cross-cutting concerns — encryption, observability, reliability — out of application code and into the network layer. Istio gives you maximum control at the cost of significant operational complexity. Linkerd offers a production-ready experience with a much gentler learning curve. Cilium sidesteps the sidecar model entirely with eBPF for environments where performance overhead is non-negotiable. Whichever you choose — or don’t choose — understanding what a service mesh does helps you make that decision deliberately rather than by cargo-culting what larger organizations run.