Anyone who has even a passing interest in Kubernetes and the cloud native ecosystem has probably heard of Istio. Getting a clear description of what exactly Istio is, what it can (and can’t) do, and whether it’s a technology you might need are all a little harder to find. Hopefully, this post will help clear up some of the confusion.
The Istio Service Mesh
What is a service mesh?
The term “service mesh” can apply either to the set of overlapping network connections between services in a distributed application or to a set of tools used to manage that group of connected services. If you have two microservices which interact via network connections, you have a service mesh. Here’s the simplest example, a mesh with two services:
More likely, as the number of microservices in your environment grows, your service mesh will start to look something like this:
As the complexity of your microservices ecosystem grows, so does the need to manage it effectively and intelligently, to get insights into how the microservices interact, and to secure communications between the microservices. Istio can provide all that functionality and more.
“The Istio service mesh” usually refers to the Istio toolset. “An Istio service mesh” usually denotes an application cluster managed by an Istio installation.
Istio’s CRDs enable programmatic configuration (using the Kubernetes API) of the behavior of the application network layer, where the application is the set of interdependent microservices.
How Istio Works
Istio’s components belong to one of two functional groupings:
- The control plane is the set of Istio services that manage configuration and monitoring of the data plane.
- The data plane consists of the istio-proxy sidecars in each of your application pods. These proxies handle the network connections between your service mesh’s microservices, receiving their routing and policy rules from the control plane and reporting back connection handling telemetry to the control plane.
The user configures their Istio service mesh through the creation of K8s resources. Istio creates dozens of Kubernetes Custom Resource Definitions that map to various aspects of its functionality.
The Data Plane
The data plane is powered by the Envoy service proxy, built with some extensions for Istio. This
istio-proxy runs as a sidecar container in each Kubernetes pod for the applications in an Istio service mesh. The proxy intercepts incoming traffic to the pod’s service ports and, by default, all outgoing TCP traffic from the pod’s other containers. In most cases, the proxy sidecar can run in a pod without requiring any changes to the application code and with only minor changes to the application’s Kubernetes Deployment and Service resource specifications.
The configuration of the proxy sidecars is managed dynamically by the services in the Istio control plane.
The Control Plane
A standard Istio v1.1 installation in a Kubernetes cluster will have the following Istio services:
- The Pilot service compiles the traffic management specs configured in Istio networking custom resources and feeds it to the istio-proxy sidecars.
- The Mixer service pulls double duty: it handles telemetry, acting as a clearinghouse for the request metrics generated by the proxy sidecars to send them to configured backends, and as the authorization policy enforcer. If policy checks are turned on for the cluster (they are off by default as of Istio 1.1), the proxy sidecars will connect to Mixer to confirm if the connection is allowed, at the cost of a slight performance hit.
- Citadel is the Istio PKI (Public Key Infrastructure) service, which generates, rotates, and revokes the client TLS certificates Istio generates for each service in a mesh and uses for peer-to-peer authentication.
- Galley is the Kubernetes controller for most of the Istio Custom Resource Definitions, processing changes to custom resources and distributing the contents to the other Istio services.
The Power and Pitfalls of Istio
With its mesh of dynamically-configurable proxies in every application pod, Istio is perfectly positioned to offer a wide range of network connection handling and control features. However, these capabilities do come at the cost of a steep learning curve and a heavy configuration load.
|Feature||Category||Requires app change?||Common Issues or Drawbacks|
|HTTP request tracing||Performance, Monitoring||Yes: works via HTTP headers; each microservice must add the header from the incoming request to outgoing requests||Only available or useful for end-to-end tracing if every service in the trace connects with plaintext HTTP and passes along the request ID header|
|mTLS (mutual Transport Layer Security)||Security: Service authentication and authorization; encryption over the wire||(Almost always) No||Existing applications already using their own peer-to-peer TLS require workarounds|
|Service-to-service RBAC||Security: service authorization||Usually no||Only effective for HTTP-based protocols, as authorization is generally based on information in HTTP headers or URI path|
|HTTP request metrics||Monitoring, Performance, Security: audit logging||No|
|TCP connection metrics||Monitoring, Performance, Security: audit Logging||No||Not as rich as the HTTP request metrics|
|Canary deployments||Release management||No|
|A/B Test Support||Release management, Feature development||No||Only effective for HTTP requests|
|Rate limiting||Reliability, Security against DoS||No|
|Configurable retries and timeouts for outbound connections||Reliability||No|
|Fault injection||“Chaos engineering” for Reliability||No (although it may uncover changes that should be made for application resilience)|
|Load balancing||Performance, reliability||No|
|Origin (user/client) request authentication and RBAC||Security: authorization and authentication||No, although applications with built-in AuthN/Z will not be able to leverage it||Currently only supports JWT tokens, which does include OAUTH2|
|Cluster egress traffic control||Security, Monitoring||No||Only a partial solution for egress control; requires a great deal of configuration to manage properly|
Other common issues with migrating existing applications, even if they are already Kubernetes-native microservices, to Istio include, ironically enough, a lack of visibility into how Istio is translating the user-supplied configurations to actual Envoy routes; understanding Istio’s requirements for deployment and service resource configuration; dealing with Kubernetes readiness and liveness probes that break when mTLS is turned on; finding ways to get Istio to work with headless services (Kubernetes services with no ClusterIP) or otherwise bypass the normal Kubernetes service discovery flow.
That said, Istio is rapidly evolving. Releases are frequent, and the working groups are very engaged and receptive to user feedback and requests, although not all changes are technically feasible or simple. Many of the limitations come from the Envoy proxy, although that is also under active development, and Istio has driven many improvements in Envoy.
Do You Need Istio?
Although Istio adoption is likely to spread quickly, especially as its feature set and manageability improve, not every Kubernetes cluster needs it. The biggest drivers for adoption are likely to be the need for solutions to one or more of the following requirements or problems:
- Performance issues in a distributed microservice-based application
- Gathering and delivering consistent request and connection metrics for all microservices
- Making over-the-wire encryption the default without having to manage TLS certificates directly
- Service-to-service control at a finer-grained resolution than vanilla Kubernetes can provide with Network Policies
- Enable release automation with canary rollouts and application API multi-version support
- Adding user authentication and authorization without modifying the application
On the other hand, if you don’t expect the number of microservices deployed to your Kubernetes clusters to grow, if nginx or haproxy meets your internal HTTP request routing needs, or if you already have manageable, effective solutions to any of the key drivers listed above, then the trade-off of the migration and operational overhead to use a complex suite of services like Istio is probably not worth it.
If you suspect you will need to meet any of the above points but not for some time, it is probably worth waiting. Istio will no doubt be even more useful and have a richer ecosystem of documentation and features improving its own manageability as time goes on.