Optimize JMeter for large scale tests
I'm sure you already have experience intense moments of loneliness trying to run distributed tests with JMeter. Am I wrong?
Sometimes, you hate JMeter solely for that reason. You've sweated blood trying to get the whole machinery up and running. I know this feeling. Want to know a secret? No, even better. Want to know 31 tips to fix JMeter distributed testing issues? I already see you smiling again!
Tips and tricks¶
We have compiled a huge list of all the practical tips and tricks to easily scale JMeter. Let's dive into the subject!
- Run From Command Line: avoid using JMeter UI during load tests, it can eat a lot of memory:
jmeter.sh -n -t script.jmx results.jtl
- Avoid Listeners: Avoid UI listeners like graphs or results table. Preferably only write results to a JTL file using Simple Data Writer. You are almost guaranteed to encounter an OutOfMemory if you do not,
Increase JVM Heap Space: launch JMeter with a greater amount of memory like in the example below:Consider increasing memory up to 8GB per JVM, with at least 8GB free RAM on the same machine.
JVM_ARGS="-Xms512m -Xmx512m" jmeter.sh
Use the latest JMeter version: JMeter is regularly improves with new releases. Stick with the latest version to get the newest improvements.
- Use Regexp Variable Extractors to extract data from responses: But never use the Body (unescaped) option which parses the whole response DOM tree. Avoid Json, Xpath and any other fancy post-processor. These consume a lot of memory.
- Avoid extracting data from large responses: the whole response is kept in memory, thus increasing the memory consumption significantly. Extract only the necessary data, not more.
- Simulate a reasonable amount of threads per machine: avoid simulating more than 1000 users per machine. Usually, the bottleneck is the network connection (1Gbps on best Cloud instances). The JVM doesn't work well with more than 2000 threads.
- Distribute load over average machines: avoid using few machines with exceptional hardware (like 16 cores / 128GB RAM), stick to average machines (4 cores, 16GB RAM) and use more machines instead. JMeter cannot take advantage of so much RAM, because JVMs doesn't work well with Xmx greater than 16GB.
- Use Assertions wisely: assertions increase memory usage. It's reasonable to use a few, but avoid using too many as it will increase memory usage dramatically. Also, avoid assertions on big server responses (greater than 1MB).
- Generate Reports after run: use the outputted JTL files to create reports once the load test is finished. Building the report requires a great amount of CPU and memory resources.
- Network is the bottleneck in most cases: Which means it's better to have medium machines with great network connectivity instead of powerful machines with low network capabilities.
- Avoid BeanShell scripts: those scripts use way more CPU to execute than Groovy scripts embedded in JSR223 samplers. Stick to JSR223 samplers if you need to run custom logic.
- Do not run in distributed mode: distributed mode works well with 20-30 machines, maybe up to 40-50 machines if you have luck. This will limit you to 20-50k concurrent users. Instead, run each JMeter instance independently by sending the JMX before starting the test, and retrieving the JTL files after finish. The RMI protocol used by JMeter to communicate between the load generators and the controller is not very efficient.
- Store only metrics not requests / responses in JTLs: Avoid storing requests, responses, response headers in JTLs because it can use a lot resources. Sure it's nice to have the server response when a failure occurs, but it's not reasonable when running big load tests.
- As a last resort, tweak the JVM: settings like garbage collector (-XX:+UseConcMarkSweepGC), server JVM (-server) can be set inside JVM_ARGS by editing the JMeter launcher script. But, JMeter may behave unexpectedly once you try to get the very last juice out of the JVM. Usually, optimizing the scenario works better.
- Optimize Regular Expressions: badly written regexps can hurt CPU performances. Try to look at Regular Expressions Guide, maybe your regexp extractor expressions can be narrowed.
- Use dedicated load generators with physical hardware: avoid using Virtual Machines, as the machine running those shares the resources (especially network) with other virtual machines. Dedicated Physical Machines are the best choices to ensure maximum stability.
- Increase load gradually: slowly increase the simulated load to avoid overwhelming the tested server too quickly. Usually, smaller load tests are run against the servers to test to warm them up before hitting it hard.
- Avoid functional mode: this run mode is not suited for load testing because it stores much more information in JTL results files.
- Store Results in CSV JTLs instead of XML: XML requires more CPU to be written on disk by JMeter. Also, it's way more verbose than CSV format, thus requiring way more disk space.
- Monitor JMeter logs during test: This can be achieved by sending JMeter logs during the load test using tools like Elastic Stack. This way, you can keep an eye on how each load generator behaves during the load test.
- Use naming conventions for test elements: This way, you can quickly correlate results with business transactions and requests. Use unique names to prevent mixing results for different test elements within the results files.
- Avoid over-complicated scenarios: The more load you simulate, the less complicated the simulated scenario should be. Complicated scenarios are difficult to understand, making results even more difficult to interpret. KISS (Keep It Simple And Stupid) is the best way to go.
- Use Oracle JVM instead of OpenJDK ones: Oracle JVMs are best suited when running high demanding JVM applications like JMeter. Oracle JVMs are usually more stable under pressure.
- Monitor the Load Generators during the test: keep an eye on CPU / Memory / Network usage on each load generator. Sometimes tests can go amok because the load generators haven't been sized properly. By monitoring them, you can prevent results mis-interpretation: the test may have gone wrong because the load generators were overwhelmed.
- Monitor tested servers: The key to find performance bottlenecks is to monitor the tested backend servers which running the load test. You will be able to correlate performance degradations with server-side issues. This way, you can fix the performance issue on backend side quicker. Tools like Perfmon are recommended.
- Run load tests in-house first: this may sound weird, but the nearer the load generators are from the tested servers, the less factors are likely to influence the performance testing results. On-Premise load tests avoid internet connectivity issues and can use large internal networks (usually 1 to 10 Gbps). Once the servers are well-performing with on-premise testing, testing from the cloud can reveal internet networking issues. This way, you can separate server issues from network issues.
- Avoid load testing third party services: Remove any requests to third-party services like Google, Gmail, Facebook, Yahoo, Reddit and more to avoid load testing services not of your own. They could ban you for doing this (seen as DDoS attack), and it can influence the testing results in a wrong way.
- Quick response times doesn't always mean fast servers: also check errors count. Servers are usually very quick at delivering error pages, and much slower to deliver meaningful web pages. Make sure you don't get tricked by great responses times when you have a high number of errors.
- Scaling JMeter is hard and time-consuming: Although it may seem interesting to run distributed JMeter testing yourself because cloud machines are so cheap, the real value isn't there. It takes time to automate sending JMXs, sending CSVs files, launching JMeter instances, monitoring everything, then collect results and analyze them. Your time has a cost too, take this into consideration.
- Avoid load testing through Proxies: Or you will end up load testing the proxy server too. Depending on the load, the proxy may fail before your servers. Avoid any in-between network equipment or server which may influence the test results.
Sure, you may have heard all those advices a millions times before. But, here is the truth: You never had the courage to compile them into a single checklist. That's why I did it for you.
And, Guess what? We're not even finished yet! I see you already wondering Software optimizations? Done! How can I optimize load generator hardware now?
Again, I know so well how you poor little guys have no time to spend collecting tips and tricks. You need to get the job done.
To be honest, i'm sure you already know all what's coming next. But, i'm sure it's nice to have a reminder. We're all aware that our brain memory is bad, aren't we?
Keep in mind these advices are just advices valid these days. You know how fast hardware can evolve!
I recommend the following hardware specs:
CPU: Latest generation Intel Core i7 (7700K at of now) or equivalent Intel Xeon with at least 4 cores / 8 threads. AMD Ryzen with 6-cores or more can be a good choice too. This is where no compromise should be made,
RAM: At least 16Gb RAM, 32GB recommended. Avoid using more than 50% RAM. ECC RAM is not really required, it provides only a marginal stability improved for long running applications. Also, RAM speed has little to no effect on performance. DDR3 or DDR4 is fine, at stock speeds,
Motherboard: select one with at least one Gigabit NIC, possibly more if you can. two gigabit LAN NICs allow to separate the load generators network from the controller to load generators network. 10Gbps can be quite expensive and may be required only for high network demanding load tests (like video streaming simulation),
Disk: Any modern SSD, either SATA or NVMe, will be fine. Just make sure to have enough disk space to store the JTL files, 128GB and above are usually fine. NVMe won't make a big difference since JMeter is not so disk dependent. Yet avoid spinning hard disk drives because it will dramatically slow down the entire machine,
Graphic Card: any will do the job, JMeter doesn't use the power of GPUs.
Power Supply: Gold or Platinum grade power supplies are more stable and preserve the hardware from power spikes. It's always a good investment to have a quality power supply,
Cooling: JMeter can be quite CPU demanding. A high quality cooler like Noctua can prevent any instability due to high CPU temps. CPUs can throttle when facing too high temperatures, which may impact the test results badly. Avoid watercooling solutions, especially AIOs, because these solutions require more maintenance while not relevant for non-overclocked CPUs,
Case: No need to go for an expensive case, mid-range towers like Fractal Design Define are great. Avoid cheap cases because they make it difficult to mount the hardware (sharp edges, bad screws...),
Overclocking: It should be avoided to prevent stability issues. Modern CPUs are already fast enough at stock speed.
The machines should be connected as near as possible to the tested server, without any network equipment possibly throwing packets in-between.
You start to worry that it will cost you a significant bunch of money to build those machines. Let me take a wild guess: You have never tried cloud instances before! Okay, I may be a little bit wrong. I'm sure you've already heard of them!
In a nutshell, cloud instances are virtual machines paid on a per hour basis. That's pretty cool if you only need machines for a few hours a month, isn't it? Now you're ready to hear the most interesting part: it's dirt cheap!
A typical 4 vcpus / 16GB RAM machine costs 0.25$/hour. Yes, you've read correctly. Can this be really true? It is!
You want to know the best part? Providers like Amazon EC2, RackSpace or DigitalOcean can spin up 100s of machines within minutes. You're instantly flooded with huge hardware power!
Just, keep in mind the following:
- Most instances are Virtual Machines: while being cheap, the performances of those machines are much less than equivalent physical machines,
- You need to start / stop instances yourself: this can be time consuming. And the bill can quickly grow if you forget to stop those 50 instances you used lastly to run your load test,
- Firewall: applications behind a firewall will need to be exposed to the outside world to allow testing them from the cloud.
You may ask yourself: Which cloud instances should I use? Amazon m4.xlarge (4 cores, 16GB RAM, 1GBps) are pretty good. Digital Ocean equivalent 8 vcpus and 16GB RAM are great tool.
Let me tell you another secret: you can even lower the costs by using Spot instances. Amazon Spot Instances are sometimes 7x cheaper than regular instances! Sounds too good to be true? Try them, you will be convince in no time. The only downside is the fact you can loose the instance while running the load test.
I know, you're sad because this is already the end of the show. It's now time to experiment all those tips and tricks!
Remember that running large scale JMeter loads tests is a tedious task. I'm sure you've already a sense of the difficulty it represents, otherwise you wouldn't be here reading those lines...
So okay, but how can you spend less time on the setup and more time testing? You start to get a grasp about the existence of managed JMeter platforms like OctoPerf. I already hear you saying I don't want to spend money on that. Sure, but isn't time valuable too?