It’s unbelievable how much information is inside of this course. It definitely takes a lot of dedication of time/effort to go through everything. I really enjoyed a recent section that was a four-part video recording of a lecture by Peep Laja. I gained a lot of value out of it. He covered 12 testing mistakes that I found very insightful and relevant.
- Precious time wasted on stupid tests — an example of this would be a team brainstorming meeting to brainstorm a bunch of test ideas, when the answer is research, research, and research. Research is the hard of all good testing and bad research/hypothesis development will skew the rest of the process and massively hurt your results.
- You think you know what will work — The reality is that you really have no idea what will work. If you really had that crystal ball, why on earth are you even testing to begin with? I find that it is usually ego and lack of experience that lends itself to this attitude. When you test enough things, you realize that you’re only right approximately half the time with only two answers possible… Experience though feeds you humble pie and shows you that you aren’t the audience and to keep an open mind. I’ve had tests get lift that I never would have guessed.
- You copy other people’s tests — You read a study, an article, a blog, or a friend tells you what they tested and then you test the same. Horrible, horrible mistake. An A/B test is a solution to a problem, so if you copy someone else’s A/B test, then you are copying someone else’s solution to someone else’s problem. The odds that you have the same problem and the same solution match-up or not likely.
- Your sample size is too low and you use statistical significance as a stopping tool — If you sample size isn’t reached, you don’t have enough data to say one way or another. Each testing tool is going to be using different statistical models which could potentially show statistical significance met before proper sample size is met.
- You run tests on pages with very little traffic — Make sure you are calculating volume for those pages that you are going to test on and that they have a large enough sample size, otherwise you are just wasting time and resources.
- Your tests don’t run long enough — Peep recommended a minimum of two weeks, but preferable 28 days for running a test. You can reduce the testing time though if you have a high volume site (which would be tens of thousands of transactions per month). Amazon and Booking.com get away with weekly tests because of the volume of traffic that they have.
- You don’t test full weeks at a time — I’ve rarely seen this one followed. I’ve seen people run tests for 2, 9, 11, 16 days at a time. There is seasonality within the days of the week and by not keeping tests to a full week at a time, you are going to skew the results. You want to run tests for 7, 14, 21, or 28 days. Don’t run them longer than 28 days though. You can get into sample pollution which means that the sample could be in both the A and the B variant of the test because they may have their computer at work, tablet at work or home, and then their phone. The other reason to not run tests for longer than 28 days is that people delete their cookies.
- Test data is not sent to third party analytics — Make sure you are sending your data to Adobe Analytics or Google Analytics. You care about the macro conversion, but you also want to see how the behavior changed in the variation with any micro conversions there. You also don’t want to put your full trust into the testing tool. You want to trust the testing tool, but verify through third party analytics the results you are seeing in the testing tool.
- You give up after your first test when a hypothesis fails — If you have ample data that this is a severe problem, you need to keep on trying to find solutions. Rarely does anybody find the solution to the problem in their first test. If there is pressure from management to have the solution found in the first test, then education needs to take place and expectations need to be set.
- You’re not aware of validity threats — You have a winner or a loser, but it may be a false positive or a false negative. Instrumentation Effect: front-end developers didn’t setup the test properly and it’s buggy. Selection Effect: You increase the paid traffic for the test to get the proper sample size, and then the test results come out, and then you lower the traffic back down to the normal traffic mix to the page and the results change comparatively to the test results. History Effect: CEO is in the paper for killing lions in Africa.
- Ignoring small gains — Big lifts are NOT common. Amazon hasn’t seen a 5% win in seven years. Booking.com has tests that win by no more than 1%. A 5% increase in conversion per month is 80% across the year. That’s the power of compound interest.
- You’re not running tests all the time — Increase the # of experiments that you’re running, provided you have the traffic to do this. Down time of not testing is learning and ultimately improvement not taking place. Planning is vital and also allocation of resources to keep this going.
I also really like the three metrics mentioned for a testing program:
- Number of variants tested
- Win rate
- Avg. uplift per successful experiment
Again, a lot of the results for the above is going to really come down to the quality of the research. Quality research will lend itself to the win rate and also increasing the avg. uplift per successful experiment because you’re addressing those direct customer’s problems.