Scaling Applications with Kubernetes: Basics

Kubernetes (K8s), a robust open-source platform, has revolutionized the way we deploy, scale, and manage containerized applications. But maximizing its benefits requires a profound understanding and thoughtful strategy. This article offers best practices and pitfalls for scaling applications with Kubernetes, peppered with real-world code examples and commands for practical insights.

Best Practices for Scaling with Kubernetes

Leverage Horizontal Pod Autoscaler (HPA)

HPA scales the number of pods in a deployment based on CPU utilization. To apply HPA:

kubectl autoscale deployment my-app --cpu-percent=80 --min=1 --max=10

To check the status of HPA:

kubectl get hpa

More details in Kubernetes Performance Tuning article.

Implement the Cluster Autoscaler

The Cluster Autoscaler adjusts the size of the Kubernetes cluster based on workload. In a self-hosted environment, this can be achieved using various tools. The autoscaler itself can be installed using Helm. Below, I'll give an example of how to install the cluster autoscaler on a self-hosted cluster using Helm:

First, you should add the official Helm chart repository for the autoscaler:

helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm repo update

Next, you can install the autoscaler using Helm. The command you use will depend on your specific environment and configuration. For example, if you're using the AWS cluster autoscaler, you might use a command like this:

helm install cluster-autoscaler autoscaler/cluster-autoscaler-chart \
    --namespace kube-system \
    --set autoDiscovery.clusterName=<YOUR CLUSTER NAME> \
    --set rbac.create=true \
    --set rbac.pspEnabled=true \
    --set sslCertPath=/etc/kubernetes/pki/ca.crt \
    --set extraArgs.balance-similar-node-groups=true \
    --set extraArgs.expander=least-waste \
    --set extraArgs.skip-nodes-with-system-pods=false

This command will install the cluster autoscaler in the kube-system namespace and configure it for your cluster.

Note: Replace <YOUR CLUSTER NAME> with your specific details.

Now, you can check the status of the cluster autoscaler with:

kubectl -n kube-system describe configmap cluster-autoscaler-status

Remember that you need to have your cluster's nodes configured correctly to add or remove resources when the autoscaler decides to scale up or down. This can involve, for example, scripts to bring additional nodes online when required or monitoring to identify when nodes can be safely removed from the cluster.

More details in How to use Kubernetes for application scaling article.

Install Kubernetes Metrics Server

The Metrics Server gathers resource metrics, crucial for automatic scaling of your applications. To deploy the Metrics Server:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

To verify the Metrics Server is running:

kubectl get deployment metrics-server -n kube-system

More about metrics you may find in Kubernetes Metrics Server: A comprehensive guide article.

Design for Fast Startup Times

Optimize your application for quick startup to make scaling more effective. Kubernetes provides a "startupProbe" to check whether your application has started.

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30
  periodSeconds: 10

In this example, Kubernetes will check the /healthz endpoint on port 8080 every 10 seconds (periodSeconds). If the probe does not succeed within 30 tries (failureThreshold), then the application container is restarted.

To diagnose probe-related issues, you can use the kubectl describe pod command:

kubectl describe pod <pod-name>

In the output, you will see an "Events" section. Look for events with "Unhealthy" type, these events are related to probes. An example of such an event:

Events:
  Type     Reason     Age                  From                                    Message
  ----     ------     ----                 ----                                    -------
  Normal   Scheduled  <unknown>            default-scheduler                       Successfully assigned default/my-app-776f7d69bd-ssx8f to minikube
  Normal   Pulled     3m34s                kubelet, minikube                       Container image "my-app:latest" already present on machine
  Normal   Created    3m34s                kubelet, minikube                       Created container my-app
  Normal   Started    3m33s                kubelet, minikube                       Started container my-app
  Warning  Unhealthy  3m (x4 over 3m23s)   kubelet, minikube                       Startup probe failed: Get "http://172.17.0.5:8080/healthz": dial tcp 172.17.0.5:8080: connect: connection refused

This output means that the startup probe failed because it couldn't connect to the application's /healthz endpoint. In this case, you should check your application's logs to see why it might not have started up correctly.

Configure Readiness and Liveness Probes

Readiness and Liveness Probes are essential mechanisms of Kubernetes that ensure the smooth operation of your containers. They are used to control the traffic sent to a pod and determine when to restart a container, respectively.

A Readiness Probe is used to signal to Kubernetes that your application is ready to accept traffic. Until a pod's containers pass the readiness probe, Kubernetes won't send traffic to the pod.

A Liveness Probe is used to signal to Kubernetes whether your application is running properly. If a container fails the liveness probe, Kubernetes will kill the container and start a new one.

Here's an example of how to define these probes in your Kubernetes configuration:

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 20

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

In this example, the readiness probe will check the /ready endpoint every 10 seconds, starting 5 seconds after the container starts. If this probe fails, Kubernetes will stop sending traffic to this pod until the probe passes.

The liveness probe will check the /healthz endpoint every 20 seconds, starting 15 seconds after the container starts. If this probe fails, Kubernetes will kill the container and start a new one.

In terms of diagnosing probe-related issues, the kubectl describe pod command can be used:

kubectl describe pod <pod-name>

In the output, you will see an "Events" section. Look for events with "Unhealthy" or "Killing" type, these events are related to probes. An example of such events:

Events:
  Type     Reason     Age                   From                                    Message
  ----     ------     ----                  ----                                    -------
  Warning  Unhealthy  5m (x3 over 5m15s)    kubelet, minikube                       Liveness probe failed: HTTP probe failed with statuscode: 500
  Normal   Killing    5m                    kubelet, minikube                       Container my-app failed liveness probe, will be restarted
  Normal   Pulled     4m47s (x2 over 6m18s) kubelet, minikube                       Container image "my-app:latest" already present on machine
  Normal   Created    4m47s (x2 over 6m18s) kubelet, minikube                       Created container my-app
  Normal   Started    4m47s (x2 over 6m17s) kubelet, minikube                       Started container my-app
  Warning  Unhealthy  70s (x6 over 4m40s)   kubelet, minikube                       Readiness probe failed: Get "http://172.17.0.5:8080/ready": dial tcp 172.17.0.5:8080: connect: connection refused

This output means that the liveness probe failed because it received a 500 status code from the /healthz endpoint, and the readiness probe failed because it couldn't connect to the /ready endpoint. In both cases, you should investigate your application logs to identify why the application might be unhealthy or not ready.

Common Pitfalls When Scaling with Kubernetes

Overutilization of Resources

Overutilization of resources happens when the application uses more resources than allocated to its pods, causing the node to become overwhelmed and degrade in performance.

You can monitor resource usage using kubectl top nodes and kubectl top pods:

kubectl top nodes
kubectl top pods

These commands show CPU and memory usage at node and pod level, helping you understand whether you're nearing the limits.

Also, consider setting resource requests and limits in your pod specifications:

resources:
  requests:
    memory: "64Mi"
    cpu: "250m"
  limits:
    memory: "128Mi"
    cpu: "500m"

Ignoring Application Startup Time

If your applications take too long to start, this could slow down the scaling process and impact the user experience. It's important to design your applications for fast startup times and use startup probes to monitor this.

You can use the kubectl describe pod command to diagnose startup times:

kubectl describe pod <pod-name>

Look for "Events" with "Created" and "Started" - the time difference gives the startup time.

Not Setting Appropriate Resource Requests and Limits

Setting appropriate resource requests and limits is crucial to prevent overutilization of nodes and ensure the quality of service. Without these settings, a single pod could potentially consume excessive resources, starving other pods on the same node.

Here's an example of how to set resource requests and limits:

resources:
  requests:
    memory: "64Mi"
    cpu: "250m"
  limits:
    memory: "128Mi"
    cpu: "500m"

Insufficient Monitoring and Logging

Without proper monitoring and logging, diagnosing issues becomes challenging, and you could miss important signs of problems. Kubernetes doesn't inherently provide a solution, but there are several third-party tools available.

For example, Prometheus can be used for monitoring, and Grafana for visualizing the data. Fluentd or Loki can be used for log aggregation.

More details about monitoring you can find in Untitled article.

Not Understanding Kubernetes Limits

Kubernetes has certain scalability limits. For example, a single Kubernetes cluster by default supports up to 5000 nodes and up to 150000 total pods.

You can check the number of pods on a node with the following command:

kubectl describe nodes | grep -A 5 "Pods:"

Exceeding these limits could cause unpredictable behavior and instability in your cluster. Make sure to plan your clusters and design your architecture keeping these limits in mind.