Wednesday, 24 July 2013

The Science of A Good Hypothesis

Good testing requires many things:  good design, good execution, good planning.  Most important is a good idea - or a good hypothesis, but many people jump into testing without a good reason for testing.  After all, testing is cool, it's capable of fixing all my online woes, and it'll produce huge improvements to my online sales, won't it?

I've talked before about good testing, and, "Let's test this and see if it works," is an example of poor test planning.  A good idea, backed up with evidence (data, or usability testing, or other valid evidence) is more likely to lead to a good result.  This is the basis of a hypothesis, and a good hypothesis is the basis of a good test.

What makes a good hypothesis?  What, and why.

According to Wiki Answers, a hypothesis is, "An educated guess about the cause of some observed (seen, noticed, or otherwise perceived) phenomena, and what seems most likely to happen and why. It is a more scientific method of guessing or assuming what is going to happen."

In simple, testing terms, a hypothesis states what you are going to test (or change) on a page, 
what the effect of the change will be, and why the effect will occur.  To put it another way, a hypothesis is an "If ... then... because..." statement.  "If I eat lots of chocolate, then I will run more slowly because I will put on weight."  Or, alternatively, "If I eat lots of chocolate, then I will run faster because I will have more energy." (I wish).

However, not all online tests are born equal, and you could probably place the majority of them into one of three groups, based on the strength of the original theory.  These are tests with a hypothesis, tests with a HIPPOthesis and tests with a hippiethesis.
Tests with a hypothesis

These are arguably the hardest tests to set up.  A good hypothesis will rely on the test analyst sitting down with data, evidence and experience (or two out of three) and working out what the data is saying.  For example, the 'what' could be that you're seeing a 93% drop-off between the cart and the first checkout page.   Why?  Well, the data shows that people are going back to the home page, or the product description page.  Why?  Well, because the call-to-action button to start checkout is probably not clear enough.  Or we aren't confirming the total cost to the customer.  Or the button is below the fold.

So, you need to change the page - and let's take the button issue as an example for our hypothesis.  People are not progressing from cart to checkout very well (only 7% proceed).  [We believe that] if we make the call to action button from cart to checkout bigger and move it above the fold, then more people will click it because it will be more visible.

There are many benefits of having a good hypothesis, and the first one is that it will tell you what to measure as the outcome of the test.  Here, it is clear that we will be measuring how many people move from cart to checkout.  The hypothesis says so.  "More people will click it" - the CTA button - so you know you're going to measure clicks and traffic moving from cart to checkout.  A good hypothesis will state after the word 'then' what the measurable outcome should be.

In my chocolate example above, it's clear that eating choclate will make me either run faster or slower, so I'll be measuring my running speed.  Neither hypothesis (the cart or the chocolate) has specified how big the change is.  If I knew how big the change was going to be, I wouldn't test.  Also, I haven't said how much more chocolate I'm going to eat, or how much faster I'll run, or how much bigger the CTA buttons should be, or how much more traffic I'll convert.  That's the next step - the test execution.  For now, the hypothesis is general enough to allow for the details to be decided later, but it frames the idea clearly and supports it with a reason why.  Of course, the hypothesis may give some indication of the detailed measurements - I might be looking at increasing my consumption of chocolate by 100 g (about 4 oz) per day, and measuring my running speed over 100 metres (about 100 yds) every week.

Tests with a HIPPOthesis

The HIPPO, for reference, is the HIghest Paid Person's Opinion (or sometimes just the HIghest Paid PersOn).  The boss.  The management.  Those who hold the budget control, who decide what's actionable, and who say what gets done.  And sometimes, what they say is that, "You will test this".  There's virtually no rationale, no data, no evidence or anything.  Just a hunch (or even a whim) from the boss, who has a new idea that he likes.  Perhaps he saw it on Amazon, or read about it in a blog, or his golf partner mentioned it on the course over the weekend.  Whatever - here's the idea, and it's your job to go and test it.

These tests are likely to be completely variable in their design.  They could be good ideas, bad ideas, mixed-up ideas or even amazing ideas.  If you're going to run the test, however, you'll have to work out (or define for yourself) what the underlying hypothesis is.  You'll also need to ask the HIPPO - very carefully - what the success metrics are.  Be prepared to pitch this question somewhere between, "So, what are you trying to test?" and "Are you sure this is a productive use of the highly skilled people that you have working for you?"  Any which way, you'll need the HIPPO to determine the success criteria, or agree to yours - in advance.  If you don't, you'll end up with a disastrous recipe being declared a technical winner because it (1) increased time on page, (2) increased time on site or (3) drove more traffic to the Contact Us page, none of which were the intended success criteria for the test, or were agreed up-front, and which may not be good things anyway.

If you have to have to run a test with a HIPPOthesis, then write your own hypothesis and identify the metrics you're going to examine.  You may also want to try and add one of your own recipes which you think will solve the apparent problem.  But at the very least, nail down the metrics...

Tests with a hippiethesis
Hippie:  noun
a person, especially of the late 1960s, who rejected established institutions and values and sought spontaneity, etc., etc.  Also hippy

The final type of test idea is a hippiethesis - laid back, not too concerned with details, spontaneous and putting forward an idea it because it looks good on paper.  "Let's test this because it's probably a good idea that will help improve site performance."  Not as bad as the 'Test this!" that drives a HIPPOthesis, but not fully-formed as a hypothesis, the hippiethesis is probably (and I'm guessing) the most common type of test.

Some examples of hippietheses:

"If we make the product images better, then we'll improve conversion."
"The data shows we need to fix our conversion funnel - let's make the buttons blue  instead of yellow."
"Let's copy Amazon because everybody knows they're the best online."

There's the basis of a good idea somewhere in there, but it's not quite finished.  A hippiethesis will tell you that the lack of a good idea is not a problem, buddy, let's just test it - testing is cool (groovy?), man!  The results will be awesome.  

There's a laid-back approach to the test (either deliberate or accidental), where the idea has not been thought through - either because "You don't need all that science stuff", or because the evidence to support a test is very flimsy or even non-existent.  Perhaps the test analyst didn't look for the evidence; perhaps he couldn't find any.  Maybe the evidence is mostly there somewhere because everybody knows about it, but isn't actually documented.  The danger here is that when you (or somebody else) start to analyse the results, you won't recall what you were testing for, what the main idea was or which metrics to look at.  You'll end up analysing without purpose, trying to prove that the test was a good idea (and you'll have to do that before you can work out what it was that you were actually trying to prove in the first place).The main difference between a hypothesis and hippiethesis is the WHY.  Online testing is a science, and scientists are curious people who ask why.  Web analyst Avinash Kaushik calls it the three levels of so what test.  If you can't get to something meaningful and useful, or in this case, testable and measureable, within three iterations of "Why?" then you're on the wrong track.  Hippies don't bother with 'why' - that's too organised, formal and part of the system; instead, they'll test because they can, and because - as I said, testing is groovy.

A good hypothesis:  IF, THEN, BECAUSE.

To wrap up:  a good hypothesis needs three things:  If (I make this change to the site) Then (I will expect this metric to improve) because (of a change in visitor behaviour that is linked to the change I made, based on evidence).

When there's no if:  you aren't making a change to the site, you're just expecting things to happen by themselves.  Crazy!  If you reconsider my chocolate hypothesis, without the if, you're left with, "I will run faster and I will have more energy".  Alternatively, "More people will click and we'll sell more."  Not a very common attitude in testing, and more likely to be found in over-optimistic entrepreneurs :-)

When there's no then:  If I eat more chocolate, I will have more energy.  So what?  And how will I measure this increased energy?  There are no metrics here.  Am I going to measure my heart rate, blood pressure, blood sugar level or body temperature??  In an online environment:  will this improve conversion, revenue, bounce rate, exit rate, time on page, time on site or average number of pages per visit?  I could measure any one of these and 'prove' the hypothesis.  At its worst, a hypothesis without a 'then' would read as badly as, "If we make the CTA bigger, [then we will move more people to cart], [because] more people will click." which becomes "If we make the CTA bigger, more people will click."  That's not a hypothesis, that's starting to state the absurdly obvious.

When there's no because:  If I eat more chocolate, then I will run faster.  Why?  Why will I run faster?  Will I run slower?  How can I run even faster?  There are metrics here (speed) but there's no reason why.  The science is missing, and there's no way I can actually learn anything from this and improve.  I will execute a one-off experiment and get a result, but I will be none the wiser about how it happened.  Was it the sugar in the chocolate?  Or the caffeine?

And finally, I should reiterate that an idea for a test doesn't have to be detailed, but it must be backed up by data (some, even if it's not great).  The more evidence the better:  think of a sliding scale from no evidence (could be a terrible idea), through to some evidence (a usability review, or a survey response, prior test result or some click-path analysis), through to multiple sources of evidence all pointing the same way - not just one or two data points, but a comprehensive case for change.  You might even have enough evidence to make a go-do recommendation (and remember, it's a successful outcome if your evidence is strong enough to prompt the business to make a change without testing).

Wednesday, 3 July 2013

Getting an Online Testing Program Off The Ground

One of the unplanned topics from one of my xChange 2013 huddles was how to get an online testing program up and running, and how to build its momentum.  We were discussing online testing more broadly, and this subject came up.  Getting a test program up and running is not easy, but during our discussion a few useful hints and tips emerged, and I wanted to add to them here.
Sometimes, launching a test program is like defying gravity.

Selling plain web analytics isn't easy, but once you have a reporting and analytics program up and running, and you're providing recommendations which are supported by data and seeing improvements in your site's performance, then the next step will probably be to propose and develop a test.  Why test?

On the one hand, if your ideas and recommendations are being wholeheartedly received by the website's management team, then you may never need to resort to a test.  If you can show with data (and other sources, such as survey responses or other voice-of-customer sources) that there's an issue on your site, and if you can use your reporting tools to show what the problem probably is - and then get the site changed based on your recommendations - and then see an improvement, then you don't need to test.  Just implement!

However, you may find that you have a recommendation, backed by data, that doesn't quite get universal approval. How would the conversation go?

"The data shows that this page needs to be fixed - the issue is here, and the survey responses I've looked at show that the page needs a bigger/smaller product image."
"Hmm, I'm not convinced."

"Well, how about we try testing it then?  If it wins, we can implement; if not, we can switch it off."
"How does that work, then?"

The ideal 'we love testing' management meeting.  Image credit.
This is idealised, I know.  But you get the idea, and then you can go on to explain the advantages of testing compared to having to implement and then roll back (when the sales figures go south).

The discussions we had during xChange showed that most testing programs were being initiated by the web analytics team - there were very few (or no) cases where management started the discussion or wanted to run a test.  As web professionals, supporting a team with sales and performance targets, we need to be able to use all the online tools available to us - including testing - so it's important that we know how to sell testing to management, and get the resources that it needs.  From management's perspective, analytics requires very little support or maintenance (compared to testing) - you tag the site (once, with occasional maintenance) and then pay any subscriptions to the web analytics provider, and pay for the staff (whether that's one member of staff or a small team).  Then - that's it.  No additional resource needed - no design, no specific IT, no JavaScript developers (except for the occasional tag change, maybe).  And every week, the mysterious combination of analyst plus tags produces a report showing how sales and traffic figures went up, down or sideways.

By contrast, testing requires considerable resource.  The design team will need to provide imagery and graphics, guidance on page design and so on.  The JavaScript developers will need to put mboxes (or the test code) around the test area; the web content team will also need to understand the changes and make them as necessary.  And that's just for one test.  If you're planning to build up a test program (and you will be, in time) then you'll need to have the support teams available more frequently.  So - what are the benefits of testing?  And how do you sell them to management, when they're looking at the list of resources that you're asking for?

1.  Testing provides the opportunity to do that:  test something that the business is already thinking of changing.  A change of banners?  A new page layout? As an analyst, you'll need to be ahead of the change curve to do this, and aware of changes before they happen, but if you get the opportunity then propose to test a new design before it goes live.  This has the advantage of having most of the resource overhead already taken into account (you don't need to design the new banner/page) but it has one significant disadvantage:  you're likely to find that there's a major bias towards the new design, and management may just go ahead and implement anyway, even if the test shows negative results for it.

2.  A good track record of analytics wins will support your case for testing.  You don't have to go back to prior analysis or recommendations and be as direct as, "I told you so," but something like, "The changes made following my analysis and recommendations on the checkout pages have led to an improvement in sales conversion of x%." is likely to be more persuasive.  And this brings me neatly on to my next suggestion.

3.  Your main aim in selling testing is to ensure you can  get the money for testing resources, and for implementation.  As I mentioned above, testing takes time, resource and manpower - or, to put it another way, money.  So you'll need to persuade the people who hold the money that testing is a worthwhile investment.  How?  By showing a potential return on that investment.

"My previous recommendation was implemented and achieved a £1k per week increase in revenue.  Additionally, if this test sees a 2% lift in conversion, that will be equal to £3k per week increase in revenue."
It's a bit of a gamble, as I've mentioned previously in discussing testing - you may not see a 2% lift in conversion, it may go flat or negative.  But the main focus for the web channel management is going to be money:  how can we use the site to make more money.  And the answer is: by improving the site.  And how do we know if we're improving the site? Because we're testing our ideas and showing that they're better than the previous version.

You do have the follow-up argument (if it does win), that, "If you don't implement this test win it will cost..." because there, you'll know exactly what the uplift is and you'll be able to present some useful financial data (assuming that yesterday's winner is not today's loser!).  Talk about £, $ or Euros... sometimes, it's the only language that management speak.

4.  Don't be afraid to carry out tests on the same part of a page.  I know I've covered this previously - but it reduces your testing overhead, and it also forces you to iterate.  It is possible to test the same part of a page without repeating yourself.  You will need to have a test program, because you'll be testing on the same part of a page, and you'll need to consult your previous tests (winners, losers and flat results) to make sure you don't repeat them.  And on the way, you'll have chance to look at why a test won, or didn't, and try to improve.  That is iteration, and iteration is a key step from just testing to having a test program.  

5.  Don't be afraid to start by testing small areas of a page.  Testing full-page redesigns is lengthy, laborious and risky.  You can get plenty of testing mileage out of testing completely different designs for a small part of a page - a banner, an image, wording... remember that testing is a management expense for the time being, not an investment, and you'll need to keep your overheads low and have good potential returns (either financial, or learning, but remember that management's primary language is money).

6.  Document everything!  As much as possible - especially if you're only doing one or two tests at a time.  Ask the code developers to explain what they've done, what worked, what issues they faced and how they overcame them.  It may be all code to you, but in a few months' time, when you're talking to a different developer who is not familiar with testing and test code, your documentation may be the only thing that keeps your testing program moving.

Also - and I've mentioned this before - document your test designs and your results.  Even if you're the only test analyst in your company, you'll need a reference library to work from, and one day, you might have a colleague or two and you'll need to show them what you've done before.

So, to wrap up - remember - it's not a problem if somebody agrees to implement a proposed test.  "No, we won't test that, we'll implement it straight away."  You made a compelling case for a change - subsequently, you (representing the data) and management (representing gut feeling and intuition) agreed on a course of action.  Wins all round.

Setting up a testing program and getting management involvement requires some sales technique, not just data and analysis, so it's often outside the analyst's usual comfort zone. However, with the right approach to management (talk their language, show them the benefits) and a small but scaleable approach to testing, you should - hopefully - be on the way to setting up a testing program.