Thursday, 9 January 2014

When Good Tests Fail

Seth Godin, online usability expert recently stated simply that,  'The answer to the question, "What if I fail?" is "You will."  The real question is, "What after I fail?"'

Despite rigorous analytics, careful usability studies and thoughtful designing, the results from your latest A/B test are bad.  Conversion worsened; average order value plummeted and people bounced off your home page like it was a trampoline.  Your test failed.  And, if you're taking it personally (and most online professionals do take it very personally), then you failed too.

But, before the boss slashes your optimisation budget, you have the opportunity to rescue the test, by reviewing all the data and understanding the full picture.  Your test failed - but why?  I've mentioned before that tests which fail draw far more attention than those which win - it's just human nature to explore why something went wrong, and we like to attribute blame or responsibility accordingly.  That's why I pull apart my Chess games to find out why I lost.  I want to improve my Chess (I'm not going to stop playing, or fire myself from playing Chess).

So, the boss asks the questions- Why did your test fail?  (And it's suddenly stopped being his test, or our test... it's yours).  Where's the conversion uplift we expected?  And why aren't profits rising?

It's time to review the test plan, the hypothesis and the key questions. Which of these apply to your test?

Answer 1.  The hypothesis was not entirely valid. I have said before that, "If I eat more chocolate, I'll be able to run faster because I will have more energy."  What I failed to consider is the build up of fat in my body, and that eating all that chocolate has made me heavier, and hence I'm actually running more slowly.  I'm not training enough to convert all that fat into movement, and the energy is being stored as fat.

Or, in an online situation:  the idea was proved incorrect.  Somewhere, one of the assumptions that was made was wrong.  This is where the key test questions come in.  The analysis that comes from answering these key questions will help retrieve your test from 'total failure' to 'learning experience'.

Sometimes, in an online context, the change we made in the test had an unforeseen side-effect.  We thought we were driving more people from the product pages to the cart, but they just weren't properly prepared.  We had the button at the bottom of the page, and people who scrolled to the bottom of the page saw the full specs of the new super-toaster and how it needs an extra battery-pack for super-toasting.  We moved the button up the page, more people clicked on it, but realised only at the cart page that it needed the additional battery pack.  We upset more people than we helped, and overall conversion went down.

Unforeseen side-effects in testing leading to adverse performance: 
too much chocolate slows down 100m run times due to increased body mass
Answer 2.  The visual design of the test recipe didn't address the test hypothesis or the key test questions.  In any lab-based scientific experiment, you would expect to set up the apparatus and equipment and take specific measurements based on the experiment you were doing.  You would also set up the equipment to address the hypothesis - otherwise you're just messing about with lab equipment.  For example, if you wanted to measure the force of gravity and how it affects moving objects, you wouldn't design an experiment with a battery, a thermometer and a microphone. 

However, in an online environment, this sort of situation becomes possible, because different people possess the skills required to analyse data and the skills to design banners etc, and the skills to write the HTML or JavaScript code.  The analyst, the designer and the developer need to work closely together to make sure that the test design which hits the screen is going to answer the original hypothesis, and not something else that the designer believes will 'look nice' or that the developer finds easier to code.  Good collaboration between the key partners in the testing process is essential - if the original test idea doesn't meet brand guidelines, or is extremely difficult to code, then it's better to get everybody together and decide what can be done that will still help prove or disprove the hypothesis.


To give a final example from my chocolate-eating context, I wouldn't expect to prove that chocolate makes me run faster by eating crisps (potato chips) instead.  Unless they were chocolate-coated crips?  Seriously.


Answer 3.  Sometimes, the test design and execution was perfect, and we measured the right metrics in the right way.  However, the test data shows that our hypothesis was completely wrong.  It's time to learn something new...!

My hypothesis said that chocolate would make me run faster; but it didn't.  Now, I apologise that I'm not a biology expert and this probably isn't correct, but let's assume it is, review the 'data' and find out why.  


For a start, I put on weight (because chocolate contains fat), but worse still, the sugar in chocolate was also converted to fat, and it wasn't converted back into sugar quickly enough for me to benefit from it while running the 100 metres.  Measurements of my speed show I got slower, and measurements of my blood sugar levels before and after the 100 metres showed that the blood sugar levels fell, because the fat in my body wasn't converted into glucose and transferred to my muscles quickly enough.  Additionally, my body mass rose 3% during the testing period, and further analysis showed this was fat, not muscle.  This increased mass also slowed me down.



Back to online:  you thought people would like it if your product pages looked more like Apple's.  But Apple sell a limited range of products - one phone, one MP3 player, one desktop PC, etc. while you sell 15-20 of each of those, and your test recipe showed only one of your products on the page (the rest were hidden behind a 'View More' link), when you get better financial performance from a range of products.  Or perhaps you thought that prompting users to chat online would help them go through checkout... but you irritated them and put them off.  Perhaps your data showed that people kept leaving your site to talk to you on the phone.  However, when you tested hiding the phone number, in order to get people to convert online, you found that sales through the phone line went down, as expected, but your online sales also fell because people were using the phone line for help completing the online purchase.  There are learnings in all cases that you can use to improve your site further - you didn't fail, you just didn't win ;-)

In conclusion Yes, sometimes test recipes lose.  Hypotheses were incorrect, assumptions were invalid, side-effects were missed and sometimes the test just didn't ask the question it was meant to.  The difference between a test losing and a test failing is in the analysis, and that comes from planning - having a good hypothesis in the first place, and asking the right questions up front which will show why the test lost (or, let's not forget, the reason why a different test won).  Until then, fail fast and learn quickly!





No comments:

Post a Comment