How do I survive the stampede?
As a performance tester, I am always surprised to see how unprepared most retail websites are. Even when load testing has been done (to prepare for sales or marketing campaigns), most of the time it is nowhere close to the real users behavior. We've already seen the importance of response times in a previous article, but there are other aspects we should consider.
For instance, unless you spent the past years on a different planet, you have probably seen these videos where thousands of people rush into stores during sales. There is no reason why thing would happen differently on a retail website. So the question is how do I survive the stampede?
Focus on what you already know¶
As obvious as it may seem, you should start by analyzing your users behavior. If this is not your first rodeo, you probably know what to expect. Also, websites like google analytics will give you an idea of what should happen. But never underestimate the importance of analyzing this data.
For instance expecting 10 000 unique users over the first hour is good but how many of these will come during the first 10 minutes? The first minute? Even seasoned load testers tend to think only about averaged loads. In this case you should focus on the critical moments, usually the first minutes.
Make preparations¶
A typical mechanism to lighten the load is to let your users prepare a wishlist a few weeks in advance. But then you can be sure that most of them will be there on D day to hit refresh until the sales is on. They will then proceed to the payment as fast as they can.
Your load tests should reflect this behavior as well as a classic user filling its cart. Also have you taken into account the bounce rate? On most websites 90% of the users don't go to the end of the purchasing process. You have to simulate this in your test scenarios or you will end up optimizing the wrong layers.
Mutualization reduces costs but also increases risks.¶
I worked with a customer running load tests every 6 months prior to the sales. Every time they would say “ok if we have time we will test the cash registers webservices”. Of course we never did and on the first day of one of the summer sales the database was overloaded by webservices calls. Because of this it was impossible to pay on the website and in the stores.
This could be seen as a design flaw, but one could argue that having the physical and electronic transactions in one database has some advantages. In this case, after upgrading the database things went smooth.
The bottom line is that a successfull load test must be conducted in real conditions. Considering part of the load as a non issue was the mistake in this case.
If you have any component mutualization, you should always try to consider the load on these as well. Think of a firewall used on several environments or load balancer serving several applications. Most often than not, I found these to be responsible for bottlenecks (critical or not). They might be oversized, but what if they are overused?
How confident am I with my third party providers?¶
Are you positive all of them are scaled to face the load? You might think "hey, we already pay these guys to provide a service, we are not paying to test their solution" but in the end you put your business at risk. For instance, using a CDN is highly recommended to speed your website. But even a CDN has a limit, at least take the time to consider the load that will be generated on a peak period and check with them if they are able to deliver.
Also keep in mind that your CDN can only help you if the latest version of your resources has been deployed in their network. This can take a couple of hours. It can be a real issue if you update content on a regular basis or publish new content just before sales.
Where do my users come from?¶
Of course you probably have or consider having a mobile application or at least a mobile version of the website. This must be taken into account because mobile users behave very differently.
Latency can be dangerous to your servers connexion pools because every request and response will take longer. Packet loss might consume lots of bandwidth were you expected to just have a lighter version of your website. Do not overlook this because bandwidth and connection pools are common bottlenecks.
What should I do?¶
- Never. Assume. Anything.
- Look at what you already know, analyse the website usage and test accordingly.
- Test all or test nothing, include third party providers and background tasks to your tests as much as possible.
- Different devices, different issues. Take into account what device & network is used and reproduce it.