Performance Tester Diary - Episode 4

This is Chapter 4 of our journey of non-functional testing with Otto Perf.

Authors note

Previous chapters are available in this blog: Chapter 1, Chapter 2, Chapter 3.

In Chapter 3 Otto started to use build and deployment technologies in the form of GIT and Jenkins to support his performance testing. He learnt about the concepts of Continuous Integration and Continuous Deployment and spent some time integrating his performance tests into a push to production pipeline.

Otto was able to schedule his performance testing using Jenkins pipelines and he spent some time building a solution that analyse the test results for him so that he was able to spend more time building performance tests. This analysis solution also helped him determine if the performance tests had passed in the fully automated push to production pipelines. Otto also discovered that being integrated within the development teams provided more benefits that just being able to develop tests and subsequently execute them earlier in the development lifecycle. He discovered that several of the development practises such as code reviews and pair programming could equally apply to test development and this was something that Quality Engineering teams had not, in his experience, regularly done.

In this Chapter we will follow Otto as he starts to explore the benefits that technologies that instrument your application can provide to performance testing. Otto will also dive deeper into trend analysis of his performance test results and look to see what the performance results data is telling him.

OctoPerf is JMeter on steroids!

Schedule a Demo

Chapter 4.¶

Bottlenecks and instrumentation¶

Otto was very happy with the way his performance testing was going, he was building new tests and executing daily as well as comparing results daily. He did however have a concern that whilst he was aware of how the application was performing from a response time perspective, he had no visibility of how the infrastructure resources were being utilised during his testing.

He cast his mind back to something that was mentioned to him, and he had written about in Chapter 2 of his diary.

This was about infrastructure sizing.

He remembers a conversation with the architects who has said that they had defined the number of instances of each server type and the amount of CPU and memory etc that they believed would be required to support full production load. But they could not guarantee that this would be accurate, and they had said that during his performance testing he should be monitoring this and making recommendations based on what he was seeing in terms of utilisation under load.

Otto thought that he should start to think about how he would do this. He knew that the test environment he was running his performance tests against was identical to production in terms of server instances and the resources they have, and he knew that the response times he was seeing when running his performance test under production levels of load were responding withing the time defined in the non-functional requirements.

Otto was also aware that whilst he was running tests at production indicative levels of load and concurrency there was still more functionality to be added, and this could still have an impact on server resources. His concerned deepened when he realised that one of the non-functional requirements, he had defined was around server utilisation and that the application servers should not exceed 60% utilisation of CPU under load and Memory should not exceed 80%.

He also noted that another of his non-functional requirements discuss the impact of Garbage Collection on Memory and utilisation metrics on the application database which should have the same CPU and memory limits that the application server do. He bought this up in the morning stand-up and was told that the infrastructure team had recently deployed Dynatrace, and this was instrumenting all servers in the non-production environment.

He now knew that this instrumentation tool would be able to provide him a comprehensive view of application resource utilisation when he was running his tests and would be able to offer insights into bottlenecks and features that were exceeding any pre-defined application tolerances. Otto spent some time looking at Dynatrace and it looked good, it was however quite difficult to pinpoint the exact transactions that he had run and to look at these in isolation to understand the impact of his tests on the server resources.

I must be able isolate the transactions that my performance tests generate he thought. It was then he stumbled across an article on this very subject, about how he can integrate Dynatrace monitoring into his JMeter tests.

Authors note

The article is in the OctoPerf Blog pages

The article clearly defined how you can create a custom header in your JMeter tests that relate to custom request attributes that you create in Dynatrace. Once Otto had added a header and the request attributes, he could easily pinpoint the transactions he was generating as part of his performance test and use the instrumentation tool to not only trace the transaction activity through the technology stack but could easily see the CPU and Memory consumption of all infrastructure components under load.

This is exactly what I need though Otto, he now had the ability to trace the impact of his load profiles on the platform. He could see that the load on several of the servers was more than the limits set in the non-functional requirements and that whilst it was not impacting performance it was close to the resource limits.

He spoke to the infrastructure team about his observations, and they asked whether he thought they should scale horizontally or vertically. Otto has read about the different types of scaling in an article he had found in Chapter 2 and quickly re-read it.

Authors note

The article is in the OctoPerf Blog pages

After some further investigation and discussions with the development team they felt it was better to scale horizontally by adding more instances or the servers rather than increasing the CPU and Memory of those already being used. After the changes were made and the test re-run Otto not only saw that CPU and Memory utilisation has decreased to limits within the non-functional requirements, but he saw that the response time of the transactions he was testing using his performance test also improved.

The response times were originally, before the infrastructure change, within the non-functional requirements but to see a further improvement got Otto thinking about whether as the functionality increased and the load profile became more diverse whether he was seeing a slow degradation in performance but had not realised. He considered that this might be because there might only be a small degradation daily, small enough to not spot easily but over time became significant and also the fact that the response times were within his non-functional requirement thresholds, so he did not pay that much attention to them because as he discussed the Chapter 3 he was focussing on those that exceeded thresholds.

He decided that he would take time later to consider how he could track this better. In the article. That Otto had read it also showed how Dynatrace could be used to easily build metrics and from these metrics you can either build dashboards or call them using a rest API.

Otto built several dashboards all showing metrics that would complement his JMeter response time reporting he had built in Chapter 3.

Trend analysis¶

Otto now turned his attention to the response time issue, the fact that the additional servers had reduced the response time had confused him to start with but when he took a further look at the results, he did notice that they had been slowly regressing for a while, and he had not noticed it for several reasons. The first was that the weekly degradation was small and was therefore considered just a variation because no 2 sets of results are identical.

The second was the fact that in Chapter 3 Otto had introduced the concept of only reporting on failures and as these transactions were not failing their non-functional requirements, they were not considered an issue. Otto felt that he should be able to avoid this issue because the last thing he wanted was for the first time he was aware of transaction response time creep was when it did exceed the non-functional requirements.

He was also interested in how response times were affected by changes to the infrastructure and how they might be affected by tests running at different times of the day. Otto wondered if he ran his tests in parallel with batch processes or backups whether this would affect response times and to what degree.

Whilst his current solution of response time measurement, based on non-functional requirement failure was the best solution for the agile approach the programme was taking, it was not necessarily the best approach for the issue he had inadvertently uncovered while increasing the capacity of the infrastructure. He wanted to keep his agile reporting but equally he wanted a better appreciation of the longer-term impact of new code, changes to infrastructure, response times run at different times of day or in parallel with batches.

He thought there must be an answer and went about researching the subject. He can across an article that extended the work he had already done which allowed him to re-use most of his existing reporting code.

Authors note

The article is in the OctoPerf Blog pages

The changes that Otto made allowed him to start to track the response times of all his transactions, and as the number of execution cycles increased so did Otto’s understanding of how response times were affected by any number of factors. These techniques allowed Otto to spot and pre-emptively raise a defect for several issues that the data indicated would be an issue over time.

Conclusion¶

Otto was really pleased with how he had made strides forward with analysing the impact of load on the infrastructure. He felt confident now in being able to not only report on the performance of the application but also making recommendations on how the production infrastructure should be sized and how the variety of different load profiles that were present at any time of the day would affect the impact of infrastructure resources.

With the help of the dashboard functionality in Dynatrace he was able to share these with the wider project team as a way of backing up his analysis. He was also pleased that he not had trend analysis integrated into his performance testing pipelines, and he was actively looking for response time regression or changes to the response times as the conditions he was testing under changed.

Join us soon for Chapter 5 in the performance testing adventures of Otto Perf.

Want to become a super load tester?

Request a Demo