Kubernetes and load balancing: A guide

Kubernetes is a powerful container orchestration platform that has become the go-to solution for deploying and managing containerized applications at scale. One of the essential features of Kubernetes is its built-in load balancing capabilities, which help ensure high availability and responsiveness of applications. Load balancing refers to efficiently distributing incoming network traffic across a group of backend servers, known as a 'server farm' or 'server pool'. In the context of Kubernetes, load balancing ensures that the service doesn't get overwhelmed with too much traffic and that every request is handled efficiently, improving the overall performance of the application.

This article will present practical examples of Kubernetes load balancing at both the service and ingress levels, as well as demonstrate session affinity configuration.

Service Level Load Balancing: Example

At the service level, Kubernetes provides an abstraction for distributing network traffic across a set of pods. In Kubernetes, a service is a logical abstraction for a set of pods and a policy to access them. Load balancing at the service level involves evenly distributing network traffic to pods within the cluster. This is done using an internal, cluster-wide load balancer. In this example, we have a deployment with multiple replicas (pods):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: my-app:1.0
        ports:
        - containerPort: 8080
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 15

To expose these pods using a service, we create the following Kubernetes service:

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app: my-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080

In this scenario, any traffic directed to my-service will be load balanced across the three pods of the my-app deployment, ensuring equal distribution of requests.

Ingress Level Load Balancing: Example

While service load balancing works within the cluster, you need a way to manage incoming external traffic to your cluster. This is where Ingress comes in. Ingress is an API object that manages external access to the services in a cluster, typically HTTP. Ingress can provide load balancing, SSL termination, and name-based virtual hosting.

To demonstrate ingress level load balancing, we first need to install an Ingress controller like Nginx or Traefik in the Kubernetes cluster. Then, we create an Ingress resource:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: my-service
            port:
              number: 80

This Ingress resource instructs the Ingress controller to route all incoming HTTP traffic with the host name myapp.example.com to the my-service service on port 80. The service then balances the traffic across the available pods, as shown in the previous example.

The LoadBalancer Service and Its Implementations

A LoadBalancer service is the standard way to expose a service to the internet. When you create a LoadBalancer service in Kubernetes, it provides a load balancer for your application in the cloud provider's infrastructure (AWS, GCP, Azure, etc.) and configures it to route traffic to your Kubernetes service. But even without a cloud provider, the address is made available to public traffic, and packets are spread among the Pods in the deployment automatically.

Here's an example of a LoadBalancer service:

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  type: LoadBalancer
  selector:
    app: my-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080

However, not all environments have a cloud provider's load balancer available. For these cases, there are alternative solutions:

MetalLB: This is an open-source load balancer designed for bare metal Kubernetes clusters. It can work in either Layer 2 or BGP mode.
Ingress Controller: An Ingress Controller is a controller that watches the Kubernetes API for changes to the Ingress resource and updates the load balancer accordingly.
NodePort Services: With NodePort, Kubernetes allocates a port from a range specified by --service-node-port-range flag (default: 30000-32767), and each node will proxy that port into your service. This has the disadvantage of possibly conflicting with other services using the same port on the node, and you are also limited to a maximum of approximately 2768 services because of the default range.
External Load Balancer: If you use an external load balancer, you need to configure it to route traffic to the NodePorts or directly to pod IPs. Here are the steps:

These tools can function as a load balancer in various environments, including bare metal Kubernetes clusters or on-premises deployments.

Remember, choosing a load balancer often depends on the specific needs of your applications, as well as the environment where your Kubernetes cluster is running.

Session Affinity: Example

Session affinity, sometimes referred to as sticky sessions, is a method of directing all requests from a client to a single pod. This can be particularly useful in cases where a client starts a session on one pod, and we need to ensure that all subsequent requests from that client go to the same pod.

Here's how you can configure session affinity in a service definition:

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app: my-app
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 100
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080

In this example, the sessionAffinity: ClientIP directive tells Kubernetes to direct all requests from a single client IP to the same pod, as long as that pod remains available. The timeoutSeconds parameter specifies how long the affinity should last, after which a different pod may be selected.

Kubernetes provides robust load balancing capabilities at both the service and ingress levels. By understanding and leveraging these features, you can optimize the distribution of network traffic across pods, enhance the responsiveness and availability of your applications, and ensure a high-quality user experience. The practical examples provided in this article serve as a starting point for implementing and fine-tuning load balancing in your own Kubernetes deployments. The topic of Kubernetes load balancing is vast, and these examples are just the basics. Therefore, it's encouraged to further explore other load balancing techniques available in Kubernetes, such as using an external load balancer, DNS round-robin, and others.