When working with microservices, it becomes difficult to track down an issue or get in-depth visibility around how your services communicate with each other. You may want to know things like traffic stats on each service, error rate, request/response payload or you may want more control on setting up access policy across all services or a set of services. You’ll also need some mechanism to handle faults, encryption, and routing. I highly recommend getting an overview of twelve factor app if you’re building microservices.

Designing your micro-services architecture considering the features you get with a service mesh system means you’ll save a ton of efforts on maintaining and monitoring the system. Your development team can focus on core logic and worry less about (mis)communication :-). Before we go into details about how service mesh like linkerd or Istio works.

Service mesh system like Istio or Linkerd allows you to have deep insight on how each service handles traffic, takes care of service discovery with sidecar acting as reverse and forward proxy, allows you to debug API calls, inspect payload and do much more. Sidecar is design pattern for containers based distributed systems.

At high-level, service mesh systems are divided into two parts i.e “Control Plane” and “Data Plane”. Control Plane is set of services which acts as a source of truth for data plane. It configures various components of data plane to ensure each service has what it needs to communicate with other services. Control Plane also takes care of managing telemetry data, configuring traffic paths and much more.

Data plane takes care of handling traffic, collecting stats, monitoring health, load balancing and authentication. Let’s dig down further to see how it works.

Data plane uses sidecar (nothing but a proxy like envoy, Nginx, HAProxy etc) which sits along with your service container in a pod, and all communication between your service and other services goes through this sidecar. This pattern enables service mesh to capture all communication and export stats to control plane.

Without a sidecar, services would communicate with each other directly. In most cases, this pattern will have some service discovery code on each service so that it knows IP and Port of target service.

Here’s how that would work:

Services talking directly with each other.

In this case, you’ll not have control over rate limiting or inspecting content of request without modifying services. In order to solve this problem, a simple mechanism is to have a proxy like Nginx or HAProxy running along with each service and service only communicates via proxy. You can have more control over who can talk to this service and at what rate. Let’s see how that would work:

Services communicating via sidecar

Routing all traffic to a service via proxy along with each service can help you gather operational metrics which is helpful in optimization as well as to ensure reliable operations. Now think about managing hundreds of services with thousands of instances and their proxies, you see the problem? That’s where control plane come into play, since containers are scheduled dynamically it is very like that old containers are destroy and new containers are deployed when scaling or rolling out upgrades. It’s the role of control plane to keep these proxies (sidecars) updated with current network state, apply access policies, encryption scheme etc. Here’s a general architecture of how this would work :

Architecture Overview of Service Mesh

There are a collection of services which makes it easy to work with service mesh. Istio & linkerd are two famous service mesh systems which you can drop in along with your existing services without changing any code, although this may look quite complex, configuring service mesh is getting simpler.

I’ve got chance to use linkerd, in next post, I’ll cover how you can get started with linkerd.