Wednesday, 24 July 2013

The Science of A Good Hypothesis

Good testing requires many things:  good design, good execution, good planning.  Most important is a good idea - or a good hypothesis, but many people jump into testing without a good reason for testing.  After all, testing is cool, it's capable of fixing all my online woes, and it'll produce huge improvements to my online sales, won't it?

I've talked before about good testing, and, "Let's test this and see if it works," is an example of poor test planning.  A good idea, backed up with evidence (data, or usability testing, or other valid evidence) is more likely to lead to a good result.  This is the basis of a hypothesis, and a good hypothesis is the basis of a good test.

What makes a good hypothesis?  What, and why.

According to Wiki Answers, a hypothesis is, "An educated guess about the cause of some observed (seen, noticed, or otherwise perceived) phenomena, and what seems most likely to happen and why. It is a more scientific method of guessing or assuming what is going to happen."

In simple, testing terms, a hypothesis states what you are going to test (or change) on a page, 
what the effect of the change will be, and why the effect will occur.  To put it another way, a hypothesis is an "If ... then... because..." statement.  "If I eat lots of chocolate, then I will run more slowly because I will put on weight."  Or, alternatively, "If I eat lots of chocolate, then I will run faster because I will have more energy." (I wish).

However, not all online tests are born equal, and you could probably place the majority of them into one of three groups, based on the strength of the original theory.  These are tests with a hypothesis, tests with a HIPPOthesis and tests with a hippiethesis.
Tests with a hypothesis

These are arguably the hardest tests to set up.  A good hypothesis will rely on the test analyst sitting down with data, evidence and experience (or two out of three) and working out what the data is saying.  For example, the 'what' could be that you're seeing a 93% drop-off between the cart and the first checkout page.   Why?  Well, the data shows that people are going back to the home page, or the product description page.  Why?  Well, because the call-to-action button to start checkout is probably not clear enough.  Or we aren't confirming the total cost to the customer.  Or the button is below the fold.

So, you need to change the page - and let's take the button issue as an example for our hypothesis.  People are not progressing from cart to checkout very well (only 7% proceed).  [We believe that] if we make the call to action button from cart to checkout bigger and move it above the fold, then more people will click it because it will be more visible.

There are many benefits of having a good hypothesis, and the first one is that it will tell you what to measure as the outcome of the test.  Here, it is clear that we will be measuring how many people move from cart to checkout.  The hypothesis says so.  "More people will click it" - the CTA button - so you know you're going to measure clicks and traffic moving from cart to checkout.  A good hypothesis will state after the word 'then' what the measurable outcome should be.

In my chocolate example above, it's clear that eating choclate will make me either run faster or slower, so I'll be measuring my running speed.  Neither hypothesis (the cart or the chocolate) has specified how big the change is.  If I knew how big the change was going to be, I wouldn't test.  Also, I haven't said how much more chocolate I'm going to eat, or how much faster I'll run, or how much bigger the CTA buttons should be, or how much more traffic I'll convert.  That's the next step - the test execution.  For now, the hypothesis is general enough to allow for the details to be decided later, but it frames the idea clearly and supports it with a reason why.  Of course, the hypothesis may give some indication of the detailed measurements - I might be looking at increasing my consumption of chocolate by 100 g (about 4 oz) per day, and measuring my running speed over 100 metres (about 100 yds) every week.

Tests with a HIPPOthesis

The HIPPO, for reference, is the HIghest Paid Person's Opinion (or sometimes just the HIghest Paid PersOn).  The boss.  The management.  Those who hold the budget control, who decide what's actionable, and who say what gets done.  And sometimes, what they say is that, "You will test this".  There's virtually no rationale, no data, no evidence or anything.  Just a hunch (or even a whim) from the boss, who has a new idea that he likes.  Perhaps he saw it on Amazon, or read about it in a blog, or his golf partner mentioned it on the course over the weekend.  Whatever - here's the idea, and it's your job to go and test it.

These tests are likely to be completely variable in their design.  They could be good ideas, bad ideas, mixed-up ideas or even amazing ideas.  If you're going to run the test, however, you'll have to work out (or define for yourself) what the underlying hypothesis is.  You'll also need to ask the HIPPO - very carefully - what the success metrics are.  Be prepared to pitch this question somewhere between, "So, what are you trying to test?" and "Are you sure this is a productive use of the highly skilled people that you have working for you?"  Any which way, you'll need the HIPPO to determine the success criteria, or agree to yours - in advance.  If you don't, you'll end up with a disastrous recipe being declared a technical winner because it (1) increased time on page, (2) increased time on site or (3) drove more traffic to the Contact Us page, none of which were the intended success criteria for the test, or were agreed up-front, and which may not be good things anyway.

If you have to have to run a test with a HIPPOthesis, then write your own hypothesis and identify the metrics you're going to examine.  You may also want to try and add one of your own recipes which you think will solve the apparent problem.  But at the very least, nail down the metrics...

Tests with a hippiethesis
Hippie:  noun
a person, especially of the late 1960s, who rejected established institutions and values and sought spontaneity, etc., etc.  Also hippy

The final type of test idea is a hippiethesis - laid back, not too concerned with details, spontaneous and putting forward an idea it because it looks good on paper.  "Let's test this because it's probably a good idea that will help improve site performance."  Not as bad as the 'Test this!" that drives a HIPPOthesis, but not fully-formed as a hypothesis, the hippiethesis is probably (and I'm guessing) the most common type of test.

Some examples of hippietheses:

"If we make the product images better, then we'll improve conversion."
"The data shows we need to fix our conversion funnel - let's make the buttons blue  instead of yellow."
"Let's copy Amazon because everybody knows they're the best online."

There's the basis of a good idea somewhere in there, but it's not quite finished.  A hippiethesis will tell you that the lack of a good idea is not a problem, buddy, let's just test it - testing is cool (groovy?), man!  The results will be awesome.  

There's a laid-back approach to the test (either deliberate or accidental), where the idea has not been thought through - either because "You don't need all that science stuff", or because the evidence to support a test is very flimsy or even non-existent.  Perhaps the test analyst didn't look for the evidence; perhaps he couldn't find any.  Maybe the evidence is mostly there somewhere because everybody knows about it, but isn't actually documented.  The danger here is that when you (or somebody else) start to analyse the results, you won't recall what you were testing for, what the main idea was or which metrics to look at.  You'll end up analysing without purpose, trying to prove that the test was a good idea (and you'll have to do that before you can work out what it was that you were actually trying to prove in the first place).The main difference between a hypothesis and hippiethesis is the WHY.  Online testing is a science, and scientists are curious people who ask why.  Web analyst Avinash Kaushik calls it the three levels of so what test.  If you can't get to something meaningful and useful, or in this case, testable and measureable, within three iterations of "Why?" then you're on the wrong track.  Hippies don't bother with 'why' - that's too organised, formal and part of the system; instead, they'll test because they can, and because - as I said, testing is groovy.

A good hypothesis:  IF, THEN, BECAUSE.

To wrap up:  a good hypothesis needs three things:  If (I make this change to the site) Then (I will expect this metric to improve) because (of a change in visitor behaviour that is linked to the change I made, based on evidence).

When there's no if:  you aren't making a change to the site, you're just expecting things to happen by themselves.  Crazy!  If you reconsider my chocolate hypothesis, without the if, you're left with, "I will run faster and I will have more energy".  Alternatively, "More people will click and we'll sell more."  Not a very common attitude in testing, and more likely to be found in over-optimistic entrepreneurs :-)

When there's no then:  If I eat more chocolate, I will have more energy.  So what?  And how will I measure this increased energy?  There are no metrics here.  Am I going to measure my heart rate, blood pressure, blood sugar level or body temperature??  In an online environment:  will this improve conversion, revenue, bounce rate, exit rate, time on page, time on site or average number of pages per visit?  I could measure any one of these and 'prove' the hypothesis.  At its worst, a hypothesis without a 'then' would read as badly as, "If we make the CTA bigger, [then we will move more people to cart], [because] more people will click." which becomes "If we make the CTA bigger, more people will click."  That's not a hypothesis, that's starting to state the absurdly obvious.

When there's no because:  If I eat more chocolate, then I will run faster.  Why?  Why will I run faster?  Will I run slower?  How can I run even faster?  There are metrics here (speed) but there's no reason why.  The science is missing, and there's no way I can actually learn anything from this and improve.  I will execute a one-off experiment and get a result, but I will be none the wiser about how it happened.  Was it the sugar in the chocolate?  Or the caffeine?

And finally, I should reiterate that an idea for a test doesn't have to be detailed, but it must be backed up by data (some, even if it's not great).  The more evidence the better:  think of a sliding scale from no evidence (could be a terrible idea), through to some evidence (a usability review, or a survey response, prior test result or some click-path analysis), through to multiple sources of evidence all pointing the same way - not just one or two data points, but a comprehensive case for change.  You might even have enough evidence to make a go-do recommendation (and remember, it's a successful outcome if your evidence is strong enough to prompt the business to make a change without testing).


  1. Krishna Chaitanya24 July 2013 at 09:02

    Great blog David. Really brought out some of the ambiguities that run around testing and hypothesis generation, and an easy solution to generate the right hypothesis for testing. Enjoyed reading through this :)