Pavel Rykov
July 31, 2023 ・ Kubernetes
Scaling Applications with Kubernetes: Basics
Kubernetes (K8s), a robust open-source platform, has revolutionized the way we deploy, scale, and manage containerized applications. But maximizing its benefits requires a profound understanding and thoughtful strategy. This article offers best practices and pitfalls for scaling applications with Kubernetes, peppered with real-world code examples and commands for practical insights.
Best Practices for Scaling with Kubernetes
Leverage Horizontal Pod Autoscaler (HPA)
HPA scales the number of pods in a deployment based on CPU utilization. To apply HPA:
kubectl autoscale deployment my-app --cpu-percent=80 --min=1 --max=10
To check the status of HPA:
kubectl get hpa
More details in Kubernetes Performance Tuning article.
Implement the Cluster Autoscaler
The Cluster Autoscaler adjusts the size of the Kubernetes cluster based on workload. In a self-hosted environment, this can be achieved using various tools. The autoscaler itself can be installed using Helm. Below, I'll give an example of how to install the cluster autoscaler on a self-hosted cluster using Helm:
First, you should add the official Helm chart repository for the autoscaler:
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm repo update
Next, you can install the autoscaler using Helm. The command you use will depend on your specific environment and configuration. For example, if you're using the AWS cluster autoscaler, you might use a command like this:
helm install cluster-autoscaler autoscaler/cluster-autoscaler-chart \
--namespace kube-system \
--set autoDiscovery.clusterName=<YOUR CLUSTER NAME> \
--set rbac.create=true \
--set rbac.pspEnabled=true \
--set sslCertPath=/etc/kubernetes/pki/ca.crt \
--set extraArgs.balance-similar-node-groups=true \
--set extraArgs.expander=least-waste \
--set extraArgs.skip-nodes-with-system-pods=false
This command will install the cluster autoscaler in the kube-system
namespace and configure it for your cluster.
Note: Replace <YOUR CLUSTER NAME>
with your specific details.
Now, you can check the status of the cluster autoscaler with:
kubectl -n kube-system describe configmap cluster-autoscaler-status
Remember that you need to have your cluster's nodes configured correctly to add or remove resources when the autoscaler decides to scale up or down. This can involve, for example, scripts to bring additional nodes online when required or monitoring to identify when nodes can be safely removed from the cluster.
More details in How to use Kubernetes for application scaling article.
Install Kubernetes Metrics Server
The Metrics Server gathers resource metrics, crucial for automatic scaling of your applications. To deploy the Metrics Server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
To verify the Metrics Server is running:
kubectl get deployment metrics-server -n kube-system
More about metrics you may find in Kubernetes Metrics Server: A comprehensive guide article.
Design for Fast Startup Times
Optimize your application for quick startup to make scaling more effective. Kubernetes provides a "startupProbe" to check whether your application has started.
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 10
In this example, Kubernetes will check the /healthz
endpoint on port 8080
every 10 seconds (periodSeconds
). If the probe does not succeed within 30 tries (failureThreshold
), then the application container is restarted.
To diagnose probe-related issues, you can use the kubectl describe pod
command:
kubectl describe pod <pod-name>
In the output, you will see an "Events" section. Look for events with "Unhealthy" type, these events are related to probes. An example of such an event:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/my-app-776f7d69bd-ssx8f to minikube
Normal Pulled 3m34s kubelet, minikube Container image "my-app:latest" already present on machine
Normal Created 3m34s kubelet, minikube Created container my-app
Normal Started 3m33s kubelet, minikube Started container my-app
Warning Unhealthy 3m (x4 over 3m23s) kubelet, minikube Startup probe failed: Get "http://172.17.0.5:8080/healthz": dial tcp 172.17.0.5:8080: connect: connection refused
This output means that the startup probe failed because it couldn't connect to the application's /healthz
endpoint. In this case, you should check your application's logs to see why it might not have started up correctly.
Configure Readiness and Liveness Probes
Readiness and Liveness Probes are essential mechanisms of Kubernetes that ensure the smooth operation of your containers. They are used to control the traffic sent to a pod and determine when to restart a container, respectively.
A Readiness Probe is used to signal to Kubernetes that your application is ready to accept traffic. Until a pod's containers pass the readiness probe, Kubernetes won't send traffic to the pod.
A Liveness Probe is used to signal to Kubernetes whether your application is running properly. If a container fails the liveness probe, Kubernetes will kill the container and start a new one.
Here's an example of how to define these probes in your Kubernetes configuration:
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
In this example, the readiness probe will check the /ready
endpoint every 10 seconds, starting 5 seconds after the container starts. If this probe fails, Kubernetes will stop sending traffic to this pod until the probe passes.
The liveness probe will check the /healthz
endpoint every 20 seconds, starting 15 seconds after the container starts. If this probe fails, Kubernetes will kill the container and start a new one.
In terms of diagnosing probe-related issues, the kubectl describe pod
command can be used:
kubectl describe pod <pod-name>
In the output, you will see an "Events" section. Look for events with "Unhealthy" or "Killing" type, these events are related to probes. An example of such events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 5m (x3 over 5m15s) kubelet, minikube Liveness probe failed: HTTP probe failed with statuscode: 500
Normal Killing 5m kubelet, minikube Container my-app failed liveness probe, will be restarted
Normal Pulled 4m47s (x2 over 6m18s) kubelet, minikube Container image "my-app:latest" already present on machine
Normal Created 4m47s (x2 over 6m18s) kubelet, minikube Created container my-app
Normal Started 4m47s (x2 over 6m17s) kubelet, minikube Started container my-app
Warning Unhealthy 70s (x6 over 4m40s) kubelet, minikube Readiness probe failed: Get "http://172.17.0.5:8080/ready": dial tcp 172.17.0.5:8080: connect: connection refused
This output means that the liveness probe failed because it received a 500
status code from the /healthz
endpoint, and the readiness probe failed because it couldn't connect to the /ready
endpoint. In both cases, you should investigate your application logs to identify why the application might be unhealthy or not ready.
Common Pitfalls When Scaling with Kubernetes
Overutilization of Resources
Overutilization of resources happens when the application uses more resources than allocated to its pods, causing the node to become overwhelmed and degrade in performance.
You can monitor resource usage using kubectl top nodes
and kubectl top pods
:
kubectl top nodes
kubectl top pods
These commands show CPU and memory usage at node and pod level, helping you understand whether you're nearing the limits.
Also, consider setting resource requests and limits in your pod specifications:
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
Read more in Untitled article.
Ignoring Application Startup Time
If your applications take too long to start, this could slow down the scaling process and impact the user experience. It's important to design your applications for fast startup times and use startup probes to monitor this.
You can use the kubectl describe pod
command to diagnose startup times:
kubectl describe pod <pod-name>
Look for "Events" with "Created" and "Started" - the time difference gives the startup time.
Not Setting Appropriate Resource Requests and Limits
Setting appropriate resource requests and limits is crucial to prevent overutilization of nodes and ensure the quality of service. Without these settings, a single pod could potentially consume excessive resources, starving other pods on the same node.
Here's an example of how to set resource requests and limits:
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
Insufficient Monitoring and Logging
Without proper monitoring and logging, diagnosing issues becomes challenging, and you could miss important signs of problems. Kubernetes doesn't inherently provide a solution, but there are several third-party tools available.
For example, Prometheus can be used for monitoring, and Grafana for visualizing the data. Fluentd or Loki can be used for log aggregation.
More details about monitoring you can find in Untitled article.
Not Understanding Kubernetes Limits
Kubernetes has certain scalability limits. For example, a single Kubernetes cluster by default supports up to 5000 nodes and up to 150000 total pods.
You can check the number of pods on a node with the following command:
kubectl describe nodes | grep -A 5 "Pods:"
Exceeding these limits could cause unpredictable behavior and instability in your cluster. Make sure to plan your clusters and design your architecture keeping these limits in mind.
- Kubernetes
- Infrastructure