
An overview of Kubernetes architecture hierarchy
Imagine a scenario, where your e-commerce application just went into production. It works fine on the first day with an influx of 1000 users throughout the day. But what happens when 500 user access the app concurrently the next day? The application crashes and returns a 500 internal server error. Turns out the service pods could not withstand that much concurrent load and crashed.
Its important to understand that the above scenario is not an application bug, rather an infrastructure failure. It is the exact reason why executing performance testing for your software infrastructure is critical. It ensures that the software architecture is robust and resilient enough to handle all sorts of spikes in user traffic.
Before we dive deeper: Infrastructure testing is different from functional testing. While functional tests verify features (sign-up, login, payments), infrastructure performance tests validate whether the underlying system can handle spikes, failures, and scaling needs. This blog focuses on the latter.
This blog will be mainly focused on microservices infrastructure and cover how to test auto-scaling, high availability, DB resource consumption as well as load balancing from a performance point of view.
Pre Requisites:
Before we dive into the topic in depth, following are the pre requisites for testing an application’s infrastructure:
A tool for cluster access i.e. Kubernetes lens
Access to DB
Access to monitoring tool, e.g. Grafana, Prometheus etc.
A basic level understanding of software infrastructure.
While there are different types and forms of software architecture, this article will be mainly dealing with the microservices architecture. It is an architecture style where the application is comprised of loosely coupled independent modules known as services.
These services are running on lightweight environments know as containers, which package an application and its dependencies. These containers are managed by deployment/orchestration tools such as Kubernetes etc.
Kubernetes organizes containers for each service into pods- which are the smallest deployable unit. A pod can contain one or more tightly coupled containers that share:
Network namespace (same IP address and ports)
Storage volumes
Lifecycle (they start, stop, and scale together)
Pods run on physical servers called nodes, and multiple nodes join together to form a cluster.
What is autoscaling?
Autoscaling is a process in cloud computing, that increases allocation of computational resources to a component, based on system requirement. In our context, if a service only has one pod and is under excessive load, auto scaling increases the number of pods to enable it to handle the load better. This is known as horizontal autoscaling.
Autoscaling in Kubernetes is usually managed through the Horizontal Pod Autoscaler (HPA), which increases or decreases the number of pods based on CPU/memory thresholds. KEDA (Kubernetes Event-Driven Autoscaling) is an add-on that extends HPA by supporting event-driven metrics (like queue length, Kafka lag, etc.).
For example in the above image, once the specific service consumes more than 75% of its CPU or memory resources, the autoscaler will increase its number of pods.
The number of minimum or maximum pods for any service can be found using the following command:
kubectl get hpa --all-namespaces
kubectl describe hpa <deployment-name> -n <namespace>
The first command will return a list of the Horizontal Pod Autosclalers (HPA). The second command will display an output like this
1. Load test execution:
One of the simplest yet most effective approaches for testing autoscaling is to execute load tests. This can be done through using tools such as JMeter, k6 etc. For example, a load test that simulates thousands of users signing up should put load on the authentication service leading it autoscaling and increasing its number of pods. Furthermore, once the test ends the number of pods should come back to normal.
2. Using libraries for load generation:
While performance test execution is an effective way to test autoscaling, some services are not directly part of end-user flows and would need to be tested in an isolated manner.
Using custom load generation tools like stress-ng is the best choice in this case. These libraries generate the required CPU consumption on a pod for a specified time. You can monitor the resource consumption for the service in real time and verify that it auto scales and downscales accordingly.
stress-ng --cpu 4 --cpu-load 90 --timeout 120s
For example, the command above will generate 90% consumption on 4 CPU cores of an instance for 2 minutes. If the KEDA threshold for the service is ~70%, the service should auto scale and the number of pods should increase.
Similarly, once the tests end, the additional pods should get deleted and the service should come back to its original state.
3. Custom shell scripts:
A limitation in the above method is that for certain machines, the performance test engineer might not have the necessary privileges to install the library. The workaround for this limitation is to utilize custom shell scripts that generate excessive CPU load on the machine.
#!/bin/bash
# Number of parallel CPU-consuming processes
NUM_PROCESSES=4
# Duration to run the CPU burn (in seconds)
TIMEOUT=120
echo "Starting CPU burn with $NUM_PROCESSES processes for $TIMEOUT seconds..."
for i in $(seq 1 $NUM_PROCESSES); do
# Each process runs an infinite loop to consume CPU, wrapped in timeout
timeout $TIMEOUT bash -c "while :; do :; done" &
done
# Wait for all background processes to finish
wait
echo "CPU burn completed."
A script like the one above can be executed inside a pod to generate load on the CPU. Commands such as while, do and done are CPU intensive commands that do not require I/O. Setting the value of NUM_PROCESSES to 4 and timeout to 120 means that the loop is executed for two minutes and starts 4 processes in parallel, which impact 4 CPU cores.
If the user does not have enough privileges to create a shell file, a similar command can be executed directly on the pod through ssh
kubectl exec <pod-name> --namespace <namespace> -c <container-name> -- \
/bin/sh -c "timeout 120 sh -c 'while :; do :; done' & timeout 120 sh -c 'while :; do :; done'; wait"
High availability is a system’s ability to remain operational and accessible even when parts of it fail. In large enterprise applications, high availability means ensuring minimal to zero downtime, so users can continue interacting with the application without interruption — even during infrastructure failures or maintenance events.
Testing approach
In our context, the approach to test self-healing is to delete multiple pods against a service and ensure that they new pods are created in their place in the shortest possible duration. This is critical to minimize downtime.
The following command can be used to simultaneously delete all the pods against a service
kubectl delete pods -l <label> -n <workspace name>
A similar command can be used to delete all the pods in a namespace, this ensures multiple services are impacted at once
kubectl delete pods --all -n <workspace name>
Once the command is executed, the tester should note down the time taken for the new pods to spin up and become active. It is also important to keep in mind that while the pods spin up almost instantly, the containers can take some time to stabilize. This time should be included in the duration being monitored. All these metrics including pod health can be observed in the pods section on Lens GUI
A deeper HA test is to simulate node failure using the following command
kubectl drain <node>
This command evicts all the pods from a node and renders it unschedulable. After executing this command, verify that the pods are scheduled onto other existing nodes and the application downtime is minimal.
Just as we execute load performance test scripts to put load on services, the same approach can be utilized to test databases. Executing load tests with concurrent read/write operations allows us to generate load on the DB and benchmark metrics such as:
CPU consumption of DB
I/O network traffic
Query performance
These metrics can be observed for databases against cloud monitoring dashboards (GCP, AWS etc). The objective of putting stress on the DB is to ensure that it is able to handle the required user traffic and does not choke during periods of high load.
Load balancing is a critical infrastructure component that ensures traffic is evenly distributed across different modules. To test load balancing effectively:
Generate load and monitor key metrics such as network throughput, request latency, and backend utilization.
Review runtime load-balancing diagrams to visualize traffic distribution among modules.
Proper load balancing guarantees that the system can handle traffic spikes and reroute requests to prevent any module from being overwhelmed.
Another aspect of software infrastructure testing is to routinely scan and observe environment monitoring dashboards and raise the alarm for anomalies such as
CPU consumption spikes: Sudden spike in CPU consumption, apart from spikes observed during performance tests
Irregular memory consumption patterns: Sudden spikes in memory consumption and overall memory consumption increasing significantly could point to potential memory leaks.
Excessive DB resource consumption: This points to some unintended background processes draining the DB resources, which can be backtracked through query listings.
These anomalies, if left unchecked, can lead to degraded performance, service outages, or scalability issues; hence the need for regular monitoring.
In today’s age of rapidly evolving dynamics and ever increasing customer demands, testing software infrastructure is a necessity, not a luxury. Rigorous performance testing in this regard ensures that an application is resilient enough to handle unexpected situations as user traffic spike and disaster recovery.
This blog is written with the intention of familiarizing performance QA engineers with the basics of software infrastructure and basic testing techniques to verify autoscaling, high availability etc. Feel free to reach out in case of queries. Happy testing!